Reference architecture GenAI - final review by eedugon · Pull Request #5510 · elastic/docs-content

eedugon · 2026-03-16T09:49:19Z

New doc for the reference architecture section.

Replaces #5073

Based on the content prepared by the SA team.

PREVIEW: deploy-manage/reference-architectures/genai-search-high-availability.md

Closes https://github.com/elastic/docs-content-internal/issues/20

This PR adds the initial content of the [GenAI Search High Availability document](https://docs.google.com/document/d/1bi3KpzvoLpQP1wmMvnYS5-kV8fV0L-WmTQcwaWKmoe4/edit?usp=sharing).

Fixing the structure of the initial content.

This PR focuses on cleaning up and reorganizing the GenAI search high availability reference architecture. ### What was done - Added a new Vector search optimization section - Reorganized the content to present use cases first, followed by vector search optimization, and then the architecture section - Organized considerations sections into a new _Important considerations_ section to be more concise and less wordy with headings - Added links to related documentation pages - Moved and reframed a paragraph to improve narrative flow: "Promoting the multi–availability zone..." description to serve as the introduction of the Architecture section - Applied substitutions and minor language cleanup Some additional language and style cleanup is still needed, along with more links to relevant documentation and resources. --------- Co-authored-by: Edu González de la Herrán <25320357+eedugon@users.noreply.github.com>

merging directly

fixing links

This PR contains small content refinements on the GenAI High Availability page.

## Summary  ## Generative AI disclosure  1. Did you use a generative AI (GenAI) tool to assist in creating this contribution? - [ ] Yes - [ ] No

This PR fixes formatting issues in the reference architectures table to ensure bullet points render, and updates the hardware specifications table with additional formatting fixes.

Data tiering section rewritten for better flow. Kibana telemetry changed to Kibana monitoring data plus extra small refinement and links. Images updated. --------- Co-authored-by: kosabogi <105062005+kosabogi@users.noreply.github.com>

…final_review

github-actions · 2026-03-16T09:50:16Z

Vale Linting Results

Summary: 6 warnings, 3 suggestions found

⚠️ Warnings (6)

File	Line	Rule	Message
deploy-manage/reference-architectures.md	35	Elastic.DontUse	Don't use 'and/or'.
deploy-manage/reference-architectures.md	35	Elastic.DontUse	Don't use 'and/or'.
deploy-manage/reference-architectures.md	35	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'and so on' instead of 'etc'.
deploy-manage/reference-architectures/genai-search-high-availability.md	27	Elastic.DontUse	Don't use 'and/or'.
deploy-manage/reference-architectures/genai-search-high-availability.md	143	Elastic.Latinisms	Latin terms and abbreviations are a common source of confusion. Use 'versus' instead of 'vs'.
deploy-manage/reference-architectures/genai-search-high-availability.md	170	Elastic.Spelling	'tiering' is a possible misspelling.

💡 Suggestions (3)

File	Line	Rule	Message
deploy-manage/reference-architectures/genai-search-high-availability.md	68	Elastic.Wordiness	Consider using 'also' instead of 'In addition'.
deploy-manage/reference-architectures/genai-search-high-availability.md	76	Elastic.WordChoice	Consider using 'can, might' instead of 'may', unless the term is in the UI.
deploy-manage/reference-architectures/genai-search-high-availability.md	137	Elastic.Wordiness	Consider using 'because' instead of 'since'.

The Vale linter checks documentation changes against the Elastic Docs style guide.

To use Vale locally or report issues, refer to Elastic style guide for Vale.

github-actions · 2026-03-16T09:53:04Z

🔍 Preview links for changed docs

john-wagster · 2026-03-16T14:36:24Z

deploy-manage/reference-architectures/genai-search-high-availability.md

+
+| Type | {{aws}} | Azure | GCP | Physical |
+| :---- | :---- | :---- | :---- | :---- |
+| hot | c6gd | f32sv2 | N2 | 16-32 vCPU, 64-256 GB RAM, 2-6 TB NVMe SSD |


this is more a question haven't dug in here myself. The c8 and m8 instances are newer and much better for at least the hot nodes and probably some of the other workloads here as well and have been out for more than a year. Is this information out of date or do we specifically reference these for customers here?

What we have discovered through building these reference architecture pages is that you will inevitably run into a situation where the information you have presented is out of date. Well we absolutely should correct it to be the latest and greatest when we find it This will be an ongoing challenge with these pages and we should probably include some wording as mitigation to say hey there may be newer information This was updated as of...date

Is this information out of date?

Probably yes!

@john-wagster , @bradquarry , this is a very good topic. We definitely need to try to make this document stable and consistent, without needing to update it very often due to things like HW updates.

I'll try to refine it a bit and maybe remove the table, or present it as an example with a link to AWS instance types for updated content.

If you have any suggestion or general guidance to provide let me know.

like choose our datahot instance types as a our suggested reference

that seems reasonable to me for the hot nodes. It also seems reasonable to me for the ML nodes but honestly I'm not sure. And in general don't really know here.

@john-wagster , @bradquarry , I've updated the section to solve this. Let me know your thoughts. I think the key is:

For GenAI search workloads in {{ech}}, use Vector Search Optimized profiles as the primary reference, and consider CPU Optimized profiles for workloads with higher CPU and disk requirements.

And then link to these 3 docs that I didn't know we had :)

Selecting the right configuration for you on {{aws}}

Selecting the right configuration for you on Azure

Selecting the right configuration for you on GCP

Because of that, and because we suggest in the official AWS doc r6gd for Vector Search I've added r6gd and c8gd as the current suggestion (depending on the user needs). And with a final sentence such as:

These recommendations provide a practical baseline, but available instance families evolve over time as newer provider hardware becomes available. For additional guidance on selecting {{ecloud}} hardware profiles for specific workloads, refer to:
(and the 3 links shared earlier).

deploy-manage/reference-architectures/genai-search-high-availability.md

bradquarry · 2026-03-18T12:22:46Z

@eedugon I added a new more comments for some simple changes, but otherwise I'm ok with this current state.

…bility.md Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>

deploy-manage/reference-architectures/genai-search-high-availability.md

kilfoyle

LGTM! 🚢
Very nice! I added just a few super minor thoughts, but overall this looks superb.

…bility.md Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>

bradquarry · 2026-03-18T20:39:11Z

@kilfoyle Thank you for all the suggestions!

…bility.md Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>

szabosteve

Very helpful and solid content. I left a few suggestions.

szabosteve · 2026-03-20T09:34:13Z

deploy-manage/reference-architectures/genai-search-high-availability.md

+This reference architecture illustrates a production-grade, highly available GenAI search solution built on {{es}}. It shows the physical deployment model, logical integration points, and key best practices for implementing a retrieval layer that grounds generative AI responses.
+
+{{es}} can combine [lexical search](/solutions/search/full-text.md), [dense vector search](/solutions/search/vector/dense-vector.md), [sparse vector search](/solutions/search/vector/sparse-vector.md), temporal and geospatial filtering, and hybrid ranking techniques. These capabilities form the foundation for [Retrieval Augmented Generation (RAG)](/solutions/search/rag.md), [agentic workflows](/explore-analyze/ai-features/elastic-agent-builder.md), and AI-assisted applications.
+


I miss some reader orientation here. The doc jumps quickly into detail.
Would it be possible to add a short section after the intro? Something like:

What you’ll learn:

Use cases

Architecture

Hardware spec

Considerations

Or something similar that helps the reader to see what they will learn about.

I completely agree, some kind of what you'll learn section that might also serve as a summary and index to the rest of the context.

@bradquarry , what do you think?

@szabosteve / @kosabogi, if you have time feel free to give it a try and propose something directly here, or we could defer it for a future PR, because I'm a bit lost on this topic.

This is a good suggestion. I think as this point I don't want to pursue structural changes to the document that also require changes to the other reference architecture document to ensure continuity in order to cut down on publish time. I can take this as a modification to both RA documents as a next step, but this is already 3-4 months late vs. what leadership is asking for and we need to focus on critical wording gaps right now.

Also, we already have all the headings on the right as quick links, someone can jusat scan that to see what is in the document without putting it inline. "On this page"

deploy-manage/reference-architectures/genai-search-high-availability.md

szabosteve · 2026-03-20T10:13:56Z

deploy-manage/reference-architectures/genai-search-high-availability.md

+Updating dense or sparse vector data can be more resource-intensive than updating keyword-based fields, since embeddings often need to be regenerated. For applications with frequent document updates, plan for additional indexing throughput and consider whether embeddings should be pre-computed, updated asynchronously, or generated on demand.
+
+## General considerations
+


We should add an intro sentence here to avoid having an H2 and an H3 immediately after.

This exists in other parts of the document and looks ok to me. I really don't want to be adding more wording as the document is already far longer than originally intended.

szabosteve · 2026-03-20T10:23:44Z

deploy-manage/reference-architectures/genai-search-high-availability.md

+## GenAI search use cases
+
+The GenAI search – high availability architecture is intended for organizations that:
+


Can we group these 9 bullet points into categories to reduce cognitive load? For example: retrieval needs, AI application types, infrastructure and security needs, integrations. Or any other logical buckets. Some of the bullets are pretty dense. I suggest breaking them into two items when possible.

I agree the load is heavy, good suggestion here is a 5 bullet re-write similar to the other reference architecture page.

Require high-performance, low-latency retrieval across large, diverse datasets with highly relevant results at scale.

Need lexical, vector, semantic, temporal, hybrid, or multimodal search across text, code, images, video, and geospatial content.

Power assistants, agents, and agentic workflows using RAG and MCP, where grounding models in the most relevant information is essential. RAG in Elastic is explicitly built around retrieving relevant context, and MCP is an open standard for connecting AI applications to external data and tools.

Integrate with foundation models and LLM frameworks, while improving relevance with re-ranking, filtering, faceting, highlighting, personalization, and metadata-aware retrieval.

Support secure multi-tenant deployments, agent memory, and domain copilots such as observability and SOC assistants.

deploy-manage/reference-architectures/genai-search-high-availability.md

eedugon · 2026-03-20T12:24:16Z

Very helpful and solid content. I left a few suggestions.

Thanks a lot @szabosteve ! Really good feedback and findings. I'll look into it next Tuesday, as I'm PTO until then.

…bility.md Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>

kosabogi and others added 10 commits February 6, 2026 15:51

Adds initial GenAI reference architecture draft (#5049)

8490c67

This PR adds the initial content of the [GenAI Search High Availability document](https://docs.google.com/document/d/1bi3KpzvoLpQP1wmMvnYS5-kV8fV0L-WmTQcwaWKmoe4/edit?usp=sharing).

structure changed and landing page updated with new architecture (#5081)

dadbb59

Fixing the structure of the initial content.

fixing links (#5133)

6f63b0c

merging directly

more links fixed (#5137)

68d1250

fixing links

Small content refinements (#5153)

9c83568

This PR contains small content refinements on the GenAI High Availability page.

Fixes formatting (#5280)

cc55a7d

This PR fixes formatting issues in the reference architectures table to ensure bullet points render, and updates the hardware specifications table with additional formatting fixes.

Merge remote-tracking branch 'origin/main' into reference_arch_genai_…

cb22bc1

…final_review

eedugon requested a review from a team as a code owner March 16, 2026 09:49

eedugon changed the title ~~Reference arch genai final review~~ Reference architecture GenAI - final review Mar 16, 2026

github-actions bot deployed to docs-preview March 16, 2026 09:50 View deployment

eedugon mentioned this pull request Mar 16, 2026

Adds initial GenAI reference architecture draft (#5049) #5073

Closed

eedugon requested review from benwtrent and jimczi March 16, 2026 09:52

eedugon requested a review from bradquarry March 16, 2026 09:59

john-wagster reviewed Mar 16, 2026

View reviewed changes

hw recommendations updated

9df7d13

github-actions bot deployed to docs-preview March 18, 2026 10:16 View deployment

eedugon requested a review from john-wagster March 18, 2026 10:23

bradquarry reviewed Mar 18, 2026

View reviewed changes

deploy-manage/reference-architectures/genai-search-high-availability.md Outdated Show resolved Hide resolved

bradquarry reviewed Mar 18, 2026

View reviewed changes

deploy-manage/reference-architectures/genai-search-high-availability.md Outdated Show resolved Hide resolved

patent word replaced

5bf7d08

github-actions bot deployed to docs-preview March 18, 2026 14:07 View deployment

suggestions included

1c8ce53

github-actions bot deployed to docs-preview March 18, 2026 14:15 View deployment

Update deploy-manage/reference-architectures/genai-search-high-availa…

78fd07e

…bility.md Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>

github-actions bot deployed to docs-preview March 18, 2026 19:26 View deployment

kilfoyle reviewed Mar 18, 2026

View reviewed changes

deploy-manage/reference-architectures/genai-search-high-availability.md Outdated Show resolved Hide resolved

kilfoyle reviewed Mar 18, 2026

View reviewed changes

deploy-manage/reference-architectures/genai-search-high-availability.md Outdated Show resolved Hide resolved

kilfoyle reviewed Mar 18, 2026

View reviewed changes

deploy-manage/reference-architectures/genai-search-high-availability.md Outdated Show resolved Hide resolved

kilfoyle reviewed Mar 18, 2026

View reviewed changes

deploy-manage/reference-architectures/genai-search-high-availability.md Outdated Show resolved Hide resolved

kilfoyle reviewed Mar 18, 2026

View reviewed changes

deploy-manage/reference-architectures/genai-search-high-availability.md Outdated Show resolved Hide resolved

kilfoyle reviewed Mar 18, 2026

View reviewed changes

deploy-manage/reference-architectures/genai-search-high-availability.md Outdated Show resolved Hide resolved

kilfoyle reviewed Mar 18, 2026

View reviewed changes

deploy-manage/reference-architectures/genai-search-high-availability.md Outdated Show resolved Hide resolved

benwtrent reviewed Mar 18, 2026

View reviewed changes

deploy-manage/reference-architectures/genai-search-high-availability.md Outdated Show resolved Hide resolved

kilfoyle approved these changes Mar 18, 2026

View reviewed changes

bradquarry and others added 6 commits March 18, 2026 16:31

Update deploy-manage/reference-architectures/genai-search-high-availa…

cef3bf4

…bility.md Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>

Update deploy-manage/reference-architectures/genai-search-high-availa…

110516d

…bility.md Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>

Update deploy-manage/reference-architectures/genai-search-high-availa…

b111e1a

…bility.md Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>

Update deploy-manage/reference-architectures/genai-search-high-availa…

8238240

…bility.md Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>

Update deploy-manage/reference-architectures/genai-search-high-availa…

cc03c80

…bility.md Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>

Update deploy-manage/reference-architectures/genai-search-high-availa…

b4c5e0b

…bility.md Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>

Update deploy-manage/reference-architectures/genai-search-high-availa…

ff63a6b

…bility.md Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>

github-actions bot deployed to docs-preview March 18, 2026 20:40 View deployment

szabosteve reviewed Mar 20, 2026

View reviewed changes

Update deploy-manage/reference-architectures/genai-search-high-availa…

0e165c5

…bility.md Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>

github-actions bot deployed to docs-preview March 20, 2026 12:43 View deployment

bradquarry and others added 3 commits March 20, 2026 08:47

Update deploy-manage/reference-architectures/genai-search-high-availa…

0a5c4cc

…bility.md Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>

Update deploy-manage/reference-architectures/genai-search-high-availa…

cc7d491

…bility.md Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>

Update deploy-manage/reference-architectures/genai-search-high-availa…

67da2e5

…bility.md Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>

github-actions bot deployed to docs-preview March 20, 2026 12:49 View deployment

Update deploy-manage/reference-architectures/genai-search-high-availa…

1f1d9f3

…bility.md Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>

github-actions bot deployed to docs-preview March 20, 2026 13:02 View deployment

		This reference architecture illustrates a production-grade, highly available GenAI search solution built on {{es}}. It shows the physical deployment model, logical integration points, and key best practices for implementing a retrieval layer that grounds generative AI responses.

		{{es}} can combine [lexical search](/solutions/search/full-text.md), [dense vector search](/solutions/search/vector/dense-vector.md), [sparse vector search](/solutions/search/vector/sparse-vector.md), temporal and geospatial filtering, and hybrid ranking techniques. These capabilities form the foundation for [Retrieval Augmented Generation (RAG)](/solutions/search/rag.md), [agentic workflows](/explore-analyze/ai-features/elastic-agent-builder.md), and AI-assisted applications.

		Updating dense or sparse vector data can be more resource-intensive than updating keyword-based fields, since embeddings often need to be regenerated. For applications with frequent document updates, plan for additional indexing throughput and consider whether embeddings should be pre-computed, updated asynchronously, or generated on demand.

		## General considerations

		## GenAI search use cases

		The GenAI search – high availability architecture is intended for organizations that:

Conversation

eedugon commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Vale Linting Results

Uh oh!

github-actions bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔍 Preview links for changed docs

Uh oh!

john-wagster Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bradquarry Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eedugon Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

john-wagster Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eedugon Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

bradquarry commented Mar 18, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kilfoyle left a comment

Choose a reason for hiding this comment

Uh oh!

bradquarry commented Mar 18, 2026

Uh oh!

szabosteve left a comment

Choose a reason for hiding this comment

Uh oh!

szabosteve Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eedugon Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

bradquarry Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bradquarry Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

szabosteve Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

bradquarry Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

szabosteve Mar 20, 2026

Choose a reason for hiding this comment

eedugon commented Mar 16, 2026 •

edited

Loading

github-actions bot commented Mar 16, 2026 •

edited

Loading

github-actions bot commented Mar 16, 2026 •

edited

Loading

john-wagster Mar 16, 2026 •

edited

Loading

bradquarry Mar 16, 2026 •

edited

Loading

eedugon Mar 16, 2026 •

edited

Loading

john-wagster Mar 16, 2026 •

edited

Loading

eedugon Mar 18, 2026 •

edited

Loading

szabosteve Mar 20, 2026 •

edited

Loading

bradquarry Mar 20, 2026 •

edited

Loading

bradquarry Mar 20, 2026 •

edited

Loading

bradquarry Mar 20, 2026 •

edited

Loading