Skip to content

Reference architecture GenAI - final review#5510

Open
eedugon wants to merge 29 commits intomainfrom
reference_arch_genai_final_review
Open

Reference architecture GenAI - final review#5510
eedugon wants to merge 29 commits intomainfrom
reference_arch_genai_final_review

Conversation

@eedugon
Copy link
Contributor

@eedugon eedugon commented Mar 16, 2026

New doc for the reference architecture section.

Replaces #5073

Based on the content prepared by the SA team.

PREVIEW: deploy-manage/reference-architectures/genai-search-high-availability.md

Closes https://github.com/elastic/docs-content-internal/issues/20

kosabogi and others added 10 commits February 6, 2026 15:51
This PR focuses on cleaning up and reorganizing the GenAI search high
availability reference architecture.

### What was done

- Added a new Vector search optimization section
- Reorganized the content to present use cases first, followed by vector
search optimization, and then the architecture section
- Organized considerations sections into a new _Important
considerations_ section to be more concise and less wordy with headings
- Added links to related documentation pages
- Moved and reframed a paragraph to improve narrative flow: "Promoting
the multi–availability zone..." description to serve as the introduction
of the Architecture section
- Applied substitutions and minor language cleanup

Some additional language and style cleanup is still needed, along with
more links to relevant documentation and resources.

---------

Co-authored-by: Edu González de la Herrán <25320357+eedugon@users.noreply.github.com>
merging directly
This PR contains small content refinements on the GenAI High
Availability page.
<!--
Thank you for contributing to the Elastic Docs! 🎉
Use this template to help us efficiently review your contribution.
-->

## Summary
<!--
Describe what your PR changes or improves.  
If your PR fixes an issue, link it here. If your PR does not fix an
issue, describe the reason you are making the change.
-->

## Generative AI disclosure
<!--
To help us ensure compliance with the Elastic open source and
documentation guidelines, please answer the following:
-->
1. Did you use a generative AI (GenAI) tool to assist in creating this
contribution?
- [ ] Yes  
- [ ] No  
<!--
2. If you answered "Yes" to the previous question, please specify the
tool(s) and model(s) used (e.g., Google Gemini, OpenAI ChatGPT-4, etc.).

Tool(s) and model(s) used:
-->
This PR fixes formatting issues in the reference architectures table to
ensure bullet points render, and updates the hardware specifications
table with additional formatting fixes.
Data tiering section rewritten for better flow.

Kibana telemetry changed to Kibana monitoring data plus extra small
refinement and links.

Images updated.

---------

Co-authored-by: kosabogi <105062005+kosabogi@users.noreply.github.com>
@eedugon eedugon requested a review from a team as a code owner March 16, 2026 09:49
@eedugon eedugon changed the title Reference arch genai final review Reference architecture GenAI - final review Mar 16, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 16, 2026

Vale Linting Results

Summary: 6 warnings, 3 suggestions found

⚠️ Warnings (6)
File Line Rule Message
deploy-manage/reference-architectures.md 35 Elastic.DontUse Don't use 'and/or'.
deploy-manage/reference-architectures.md 35 Elastic.DontUse Don't use 'and/or'.
deploy-manage/reference-architectures.md 35 Elastic.Latinisms Latin terms and abbreviations are a common source of confusion. Use 'and so on' instead of 'etc'.
deploy-manage/reference-architectures/genai-search-high-availability.md 27 Elastic.DontUse Don't use 'and/or'.
deploy-manage/reference-architectures/genai-search-high-availability.md 143 Elastic.Latinisms Latin terms and abbreviations are a common source of confusion. Use 'versus' instead of 'vs'.
deploy-manage/reference-architectures/genai-search-high-availability.md 170 Elastic.Spelling 'tiering' is a possible misspelling.
💡 Suggestions (3)
File Line Rule Message
deploy-manage/reference-architectures/genai-search-high-availability.md 68 Elastic.Wordiness Consider using 'also' instead of 'In addition'.
deploy-manage/reference-architectures/genai-search-high-availability.md 76 Elastic.WordChoice Consider using 'can, might' instead of 'may', unless the term is in the UI.
deploy-manage/reference-architectures/genai-search-high-availability.md 137 Elastic.Wordiness Consider using 'because' instead of 'since'.

The Vale linter checks documentation changes against the Elastic Docs style guide.

To use Vale locally or report issues, refer to Elastic style guide for Vale.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 16, 2026

@eedugon eedugon requested a review from bradquarry March 16, 2026 09:59

| Type | {{aws}} | Azure | GCP | Physical |
| :---- | :---- | :---- | :---- | :---- |
| hot | c6gd | f32sv2 | N2 | 16-32 vCPU, 64-256 GB RAM, 2-6 TB NVMe SSD |
Copy link
Contributor

@john-wagster john-wagster Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is more a question haven't dug in here myself. The c8 and m8 instances are newer and much better for at least the hot nodes and probably some of the other workloads here as well and have been out for more than a year. Is this information out of date or do we specifically reference these for customers here?

Copy link
Contributor

@bradquarry bradquarry Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we have discovered through building these reference architecture pages is that you will inevitably run into a situation where the information you have presented is out of date. Well we absolutely should correct it to be the latest and greatest when we find it This will be an ongoing challenge with these pages and we should probably include some wording as mitigation to say hey there may be newer information This was updated as of...date

Copy link
Contributor Author

@eedugon eedugon Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this information out of date?

Probably yes!

@john-wagster , @bradquarry , this is a very good topic. We definitely need to try to make this document stable and consistent, without needing to update it very often due to things like HW updates.

I'll try to refine it a bit and maybe remove the table, or present it as an example with a link to AWS instance types for updated content.

If you have any suggestion or general guidance to provide let me know.

Copy link
Contributor

@john-wagster john-wagster Mar 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like choose our datahot instance types as a our suggested reference

that seems reasonable to me for the hot nodes. It also seems reasonable to me for the ML nodes but honestly I'm not sure. And in general don't really know here.

Copy link
Contributor Author

@eedugon eedugon Mar 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@john-wagster , @bradquarry , I've updated the section to solve this. Let me know your thoughts. I think the key is:

For GenAI search workloads in {{ech}}, use Vector Search Optimized profiles as the primary reference, and consider CPU Optimized profiles for workloads with higher CPU and disk requirements.

And then link to these 3 docs that I didn't know we had :)

Because of that, and because we suggest in the official AWS doc r6gd for Vector Search I've added r6gd and c8gd as the current suggestion (depending on the user needs). And with a final sentence such as:

These recommendations provide a practical baseline, but available instance families evolve over time as newer provider hardware becomes available. For additional guidance on selecting {{ecloud}} hardware profiles for specific workloads, refer to:
(and the 3 links shared earlier).

@bradquarry
Copy link
Contributor

@eedugon I added a new more comments for some simple changes, but otherwise I'm ok with this current state.

…bility.md

Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>
Copy link
Contributor

@kilfoyle kilfoyle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🚢
Very nice! I added just a few super minor thoughts, but overall this looks superb.

bradquarry and others added 6 commits March 18, 2026 16:31
…bility.md

Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>
…bility.md

Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>
…bility.md

Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>
…bility.md

Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>
…bility.md

Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>
…bility.md

Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>
@bradquarry
Copy link
Contributor

@kilfoyle Thank you for all the suggestions!

…bility.md

Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>
Copy link
Contributor

@szabosteve szabosteve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very helpful and solid content. I left a few suggestions.

This reference architecture illustrates a production-grade, highly available GenAI search solution built on {{es}}. It shows the physical deployment model, logical integration points, and key best practices for implementing a retrieval layer that grounds generative AI responses.

{{es}} can combine [lexical search](/solutions/search/full-text.md), [dense vector search](/solutions/search/vector/dense-vector.md), [sparse vector search](/solutions/search/vector/sparse-vector.md), temporal and geospatial filtering, and hybrid ranking techniques. These capabilities form the foundation for [Retrieval Augmented Generation (RAG)](/solutions/search/rag.md), [agentic workflows](/explore-analyze/ai-features/elastic-agent-builder.md), and AI-assisted applications.

Copy link
Contributor

@szabosteve szabosteve Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I miss some reader orientation here. The doc jumps quickly into detail.
Would it be possible to add a short section after the intro? Something like:

What you’ll learn:

  • Use cases
  • Architecture
  • Hardware spec
  • Considerations

Or something similar that helps the reader to see what they will learn about.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree, some kind of what you'll learn section that might also serve as a summary and index to the rest of the context.

@bradquarry , what do you think?

@szabosteve / @kosabogi, if you have time feel free to give it a try and propose something directly here, or we could defer it for a future PR, because I'm a bit lost on this topic.

Copy link
Contributor

@bradquarry bradquarry Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good suggestion. I think as this point I don't want to pursue structural changes to the document that also require changes to the other reference architecture document to ensure continuity in order to cut down on publish time. I can take this as a modification to both RA documents as a next step, but this is already 3-4 months late vs. what leadership is asking for and we need to focus on critical wording gaps right now.

Copy link
Contributor

@bradquarry bradquarry Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we already have all the headings on the right as quick links, someone can jusat scan that to see what is in the document without putting it inline. "On this page"

Updating dense or sparse vector data can be more resource-intensive than updating keyword-based fields, since embeddings often need to be regenerated. For applications with frequent document updates, plan for additional indexing throughput and consider whether embeddings should be pre-computed, updated asynchronously, or generated on demand.

## General considerations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add an intro sentence here to avoid having an H2 and an H3 immediately after.

Copy link
Contributor

@bradquarry bradquarry Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exists in other parts of the document and looks ok to me. I really don't want to be adding more wording as the document is already far longer than originally intended.

## GenAI search use cases

The GenAI search – high availability architecture is intended for organizations that:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we group these 9 bullet points into categories to reduce cognitive load? For example: retrieval needs, AI application types, infrastructure and security needs, integrations. Or any other logical buckets. Some of the bullets are pretty dense. I suggest breaking them into two items when possible.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the load is heavy, good suggestion here is a 5 bullet re-write similar to the other reference architecture page.

  • Require high-performance, low-latency retrieval across large, diverse datasets with highly relevant results at scale.
  • Need lexical, vector, semantic, temporal, hybrid, or multimodal search across text, code, images, video, and geospatial content.
  • Power assistants, agents, and agentic workflows using RAG and MCP, where grounding models in the most relevant information is essential. RAG in Elastic is explicitly built around retrieving relevant context, and MCP is an open standard for connecting AI applications to external data and tools.
  • Integrate with foundation models and LLM frameworks, while improving relevance with re-ranking, filtering, faceting, highlighting, personalization, and metadata-aware retrieval.
  • Support secure multi-tenant deployments, agent memory, and domain copilots such as observability and SOC assistants.

@eedugon
Copy link
Contributor Author

eedugon commented Mar 20, 2026

Very helpful and solid content. I left a few suggestions.

Thanks a lot @szabosteve ! Really good feedback and findings. I'll look into it next Tuesday, as I'm PTO until then.

…bility.md

Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
bradquarry and others added 3 commits March 20, 2026 08:47
…bility.md

Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
…bility.md

Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
…bility.md

Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
…bility.md

Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants