Conversation
This PR adds the initial content of the [GenAI Search High Availability document](https://docs.google.com/document/d/1bi3KpzvoLpQP1wmMvnYS5-kV8fV0L-WmTQcwaWKmoe4/edit?usp=sharing).
Fixing the structure of the initial content.
This PR focuses on cleaning up and reorganizing the GenAI search high availability reference architecture. ### What was done - Added a new Vector search optimization section - Reorganized the content to present use cases first, followed by vector search optimization, and then the architecture section - Organized considerations sections into a new _Important considerations_ section to be more concise and less wordy with headings - Added links to related documentation pages - Moved and reframed a paragraph to improve narrative flow: "Promoting the multi–availability zone..." description to serve as the introduction of the Architecture section - Applied substitutions and minor language cleanup Some additional language and style cleanup is still needed, along with more links to relevant documentation and resources. --------- Co-authored-by: Edu González de la Herrán <25320357+eedugon@users.noreply.github.com>
merging directly
fixing links
This PR contains small content refinements on the GenAI High Availability page.
<!-- Thank you for contributing to the Elastic Docs! 🎉 Use this template to help us efficiently review your contribution. --> ## Summary <!-- Describe what your PR changes or improves. If your PR fixes an issue, link it here. If your PR does not fix an issue, describe the reason you are making the change. --> ## Generative AI disclosure <!-- To help us ensure compliance with the Elastic open source and documentation guidelines, please answer the following: --> 1. Did you use a generative AI (GenAI) tool to assist in creating this contribution? - [ ] Yes - [ ] No <!-- 2. If you answered "Yes" to the previous question, please specify the tool(s) and model(s) used (e.g., Google Gemini, OpenAI ChatGPT-4, etc.). Tool(s) and model(s) used: -->
This PR fixes formatting issues in the reference architectures table to ensure bullet points render, and updates the hardware specifications table with additional formatting fixes.
Data tiering section rewritten for better flow. Kibana telemetry changed to Kibana monitoring data plus extra small refinement and links. Images updated. --------- Co-authored-by: kosabogi <105062005+kosabogi@users.noreply.github.com>
Vale Linting ResultsSummary: 6 warnings, 3 suggestions found
|
| File | Line | Rule | Message |
|---|---|---|---|
| deploy-manage/reference-architectures.md | 35 | Elastic.DontUse | Don't use 'and/or'. |
| deploy-manage/reference-architectures.md | 35 | Elastic.DontUse | Don't use 'and/or'. |
| deploy-manage/reference-architectures.md | 35 | Elastic.Latinisms | Latin terms and abbreviations are a common source of confusion. Use 'and so on' instead of 'etc'. |
| deploy-manage/reference-architectures/genai-search-high-availability.md | 27 | Elastic.DontUse | Don't use 'and/or'. |
| deploy-manage/reference-architectures/genai-search-high-availability.md | 143 | Elastic.Latinisms | Latin terms and abbreviations are a common source of confusion. Use 'versus' instead of 'vs'. |
| deploy-manage/reference-architectures/genai-search-high-availability.md | 170 | Elastic.Spelling | 'tiering' is a possible misspelling. |
💡 Suggestions (3)
| File | Line | Rule | Message |
|---|---|---|---|
| deploy-manage/reference-architectures/genai-search-high-availability.md | 68 | Elastic.Wordiness | Consider using 'also' instead of 'In addition'. |
| deploy-manage/reference-architectures/genai-search-high-availability.md | 76 | Elastic.WordChoice | Consider using 'can, might' instead of 'may', unless the term is in the UI. |
| deploy-manage/reference-architectures/genai-search-high-availability.md | 137 | Elastic.Wordiness | Consider using 'because' instead of 'since'. |
The Vale linter checks documentation changes against the Elastic Docs style guide.
To use Vale locally or report issues, refer to Elastic style guide for Vale.
🔍 Preview links for changed docs |
|
|
||
| | Type | {{aws}} | Azure | GCP | Physical | | ||
| | :---- | :---- | :---- | :---- | :---- | | ||
| | hot | c6gd | f32sv2 | N2 | 16-32 vCPU, 64-256 GB RAM, 2-6 TB NVMe SSD | |
There was a problem hiding this comment.
this is more a question haven't dug in here myself. The c8 and m8 instances are newer and much better for at least the hot nodes and probably some of the other workloads here as well and have been out for more than a year. Is this information out of date or do we specifically reference these for customers here?
There was a problem hiding this comment.
What we have discovered through building these reference architecture pages is that you will inevitably run into a situation where the information you have presented is out of date. Well we absolutely should correct it to be the latest and greatest when we find it This will be an ongoing challenge with these pages and we should probably include some wording as mitigation to say hey there may be newer information This was updated as of...date
There was a problem hiding this comment.
Is this information out of date?
Probably yes!
@john-wagster , @bradquarry , this is a very good topic. We definitely need to try to make this document stable and consistent, without needing to update it very often due to things like HW updates.
I'll try to refine it a bit and maybe remove the table, or present it as an example with a link to AWS instance types for updated content.
If you have any suggestion or general guidance to provide let me know.
There was a problem hiding this comment.
like choose our datahot instance types as a our suggested reference
that seems reasonable to me for the hot nodes. It also seems reasonable to me for the ML nodes but honestly I'm not sure. And in general don't really know here.
There was a problem hiding this comment.
@john-wagster , @bradquarry , I've updated the section to solve this. Let me know your thoughts. I think the key is:
For GenAI search workloads in {{ech}}, use Vector Search Optimized profiles as the primary reference, and consider CPU Optimized profiles for workloads with higher CPU and disk requirements.
And then link to these 3 docs that I didn't know we had :)
- Selecting the right configuration for you on {{aws}}
- Selecting the right configuration for you on Azure
- Selecting the right configuration for you on GCP
Because of that, and because we suggest in the official AWS doc r6gd for Vector Search I've added r6gd and c8gd as the current suggestion (depending on the user needs). And with a final sentence such as:
These recommendations provide a practical baseline, but available instance families evolve over time as newer provider hardware becomes available. For additional guidance on selecting {{ecloud}} hardware profiles for specific workloads, refer to:
(and the 3 links shared earlier).
deploy-manage/reference-architectures/genai-search-high-availability.md
Outdated
Show resolved
Hide resolved
deploy-manage/reference-architectures/genai-search-high-availability.md
Outdated
Show resolved
Hide resolved
|
@eedugon I added a new more comments for some simple changes, but otherwise I'm ok with this current state. |
…bility.md Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>
deploy-manage/reference-architectures/genai-search-high-availability.md
Outdated
Show resolved
Hide resolved
deploy-manage/reference-architectures/genai-search-high-availability.md
Outdated
Show resolved
Hide resolved
deploy-manage/reference-architectures/genai-search-high-availability.md
Outdated
Show resolved
Hide resolved
deploy-manage/reference-architectures/genai-search-high-availability.md
Outdated
Show resolved
Hide resolved
deploy-manage/reference-architectures/genai-search-high-availability.md
Outdated
Show resolved
Hide resolved
deploy-manage/reference-architectures/genai-search-high-availability.md
Outdated
Show resolved
Hide resolved
deploy-manage/reference-architectures/genai-search-high-availability.md
Outdated
Show resolved
Hide resolved
deploy-manage/reference-architectures/genai-search-high-availability.md
Outdated
Show resolved
Hide resolved
kilfoyle
left a comment
There was a problem hiding this comment.
LGTM! 🚢
Very nice! I added just a few super minor thoughts, but overall this looks superb.
…bility.md Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>
…bility.md Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>
…bility.md Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>
…bility.md Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>
…bility.md Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>
…bility.md Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>
|
@kilfoyle Thank you for all the suggestions! |
…bility.md Co-authored-by: David Kilfoyle <41695641+kilfoyle@users.noreply.github.com>
szabosteve
left a comment
There was a problem hiding this comment.
Very helpful and solid content. I left a few suggestions.
| This reference architecture illustrates a production-grade, highly available GenAI search solution built on {{es}}. It shows the physical deployment model, logical integration points, and key best practices for implementing a retrieval layer that grounds generative AI responses. | ||
|
|
||
| {{es}} can combine [lexical search](/solutions/search/full-text.md), [dense vector search](/solutions/search/vector/dense-vector.md), [sparse vector search](/solutions/search/vector/sparse-vector.md), temporal and geospatial filtering, and hybrid ranking techniques. These capabilities form the foundation for [Retrieval Augmented Generation (RAG)](/solutions/search/rag.md), [agentic workflows](/explore-analyze/ai-features/elastic-agent-builder.md), and AI-assisted applications. | ||
|
|
There was a problem hiding this comment.
I miss some reader orientation here. The doc jumps quickly into detail.
Would it be possible to add a short section after the intro? Something like:
What you’ll learn:
- Use cases
- Architecture
- Hardware spec
- Considerations
Or something similar that helps the reader to see what they will learn about.
There was a problem hiding this comment.
I completely agree, some kind of what you'll learn section that might also serve as a summary and index to the rest of the context.
@bradquarry , what do you think?
@szabosteve / @kosabogi, if you have time feel free to give it a try and propose something directly here, or we could defer it for a future PR, because I'm a bit lost on this topic.
There was a problem hiding this comment.
This is a good suggestion. I think as this point I don't want to pursue structural changes to the document that also require changes to the other reference architecture document to ensure continuity in order to cut down on publish time. I can take this as a modification to both RA documents as a next step, but this is already 3-4 months late vs. what leadership is asking for and we need to focus on critical wording gaps right now.
There was a problem hiding this comment.
Also, we already have all the headings on the right as quick links, someone can jusat scan that to see what is in the document without putting it inline. "On this page"
deploy-manage/reference-architectures/genai-search-high-availability.md
Outdated
Show resolved
Hide resolved
deploy-manage/reference-architectures/genai-search-high-availability.md
Outdated
Show resolved
Hide resolved
deploy-manage/reference-architectures/genai-search-high-availability.md
Outdated
Show resolved
Hide resolved
deploy-manage/reference-architectures/genai-search-high-availability.md
Outdated
Show resolved
Hide resolved
| Updating dense or sparse vector data can be more resource-intensive than updating keyword-based fields, since embeddings often need to be regenerated. For applications with frequent document updates, plan for additional indexing throughput and consider whether embeddings should be pre-computed, updated asynchronously, or generated on demand. | ||
|
|
||
| ## General considerations | ||
|
|
There was a problem hiding this comment.
We should add an intro sentence here to avoid having an H2 and an H3 immediately after.
There was a problem hiding this comment.
This exists in other parts of the document and looks ok to me. I really don't want to be adding more wording as the document is already far longer than originally intended.
| ## GenAI search use cases | ||
|
|
||
| The GenAI search – high availability architecture is intended for organizations that: | ||
|
|
There was a problem hiding this comment.
Can we group these 9 bullet points into categories to reduce cognitive load? For example: retrieval needs, AI application types, infrastructure and security needs, integrations. Or any other logical buckets. Some of the bullets are pretty dense. I suggest breaking them into two items when possible.
There was a problem hiding this comment.
I agree the load is heavy, good suggestion here is a 5 bullet re-write similar to the other reference architecture page.
- Require high-performance, low-latency retrieval across large, diverse datasets with highly relevant results at scale.
- Need lexical, vector, semantic, temporal, hybrid, or multimodal search across text, code, images, video, and geospatial content.
- Power assistants, agents, and agentic workflows using RAG and MCP, where grounding models in the most relevant information is essential. RAG in Elastic is explicitly built around retrieving relevant context, and MCP is an open standard for connecting AI applications to external data and tools.
- Integrate with foundation models and LLM frameworks, while improving relevance with re-ranking, filtering, faceting, highlighting, personalization, and metadata-aware retrieval.
- Support secure multi-tenant deployments, agent memory, and domain copilots such as observability and SOC assistants.
deploy-manage/reference-architectures/genai-search-high-availability.md
Outdated
Show resolved
Hide resolved
Thanks a lot @szabosteve ! Really good feedback and findings. I'll look into it next Tuesday, as I'm PTO until then. |
…bility.md Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
…bility.md Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
…bility.md Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
…bility.md Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
…bility.md Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
New doc for the reference architecture section.
Replaces #5073
Based on the content prepared by the SA team.
PREVIEW: deploy-manage/reference-architectures/genai-search-high-availability.md
Closes https://github.com/elastic/docs-content-internal/issues/20