Skip to content

Adds explanation on when to use dense or sparse embeddings#6727

Open
kosabogi wants to merge 1 commit into
mainfrom
sparse-dense
Open

Adds explanation on when to use dense or sparse embeddings#6727
kosabogi wants to merge 1 commit into
mainfrom
sparse-dense

Conversation

@kosabogi
Copy link
Copy Markdown
Member

Summary

This PR adds explanation on when to use dense and sparse vectors to the Tutorial: Dense and sparse workflows using ingest pipelines page.

Related issue: https://github.com/elastic/docs-content-internal/issues/854

Generative AI disclosure

  1. Did you use a generative AI (GenAI) tool to assist in creating this contribution?
  • Yes
  • No
Claude in Cursor

@kosabogi kosabogi requested a review from seanhandley May 27, 2026 08:40
@kosabogi kosabogi requested a review from a team as a code owner May 27, 2026 08:40
@github-actions
Copy link
Copy Markdown
Contributor

Elastic Docs AI PR menu

Check the box to run an AI review for this pull request.

  • Review docs changes (docs-review). Status: not started.

Powered by GitHub Agentic Workflows and docs-actions. For more information, reach out to the docs team.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

🔍 Preview links for changed docs

@github-actions
Copy link
Copy Markdown
Contributor

Vale Linting Results

Summary: 1 suggestion found

💡 Suggestions (1)
File Line Rule Message
solutions/search/vector/dense-versus-sparse-ingest-pipelines.md 54 Elastic.FirstPerson Use caution when using first-person pronouns such as 'my.'

The Vale linter checks documentation changes against the Elastic Docs style guide.

To use Vale locally or report issues, refer to Elastic style guide for Vale.

Copy link
Copy Markdown
Contributor

@seanhandley seanhandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks almost ready @kosabogi - just a couple of thoughts.

- [Natural language Q&A](/explore-analyze/machine-learning/nlp/ml-nlp-text-emb-vector-search-example.md): Match questions like "How do I reset my password?" to FAQ entries, product documentation, or policy pages.
- [Recommendations and similarity](knn.md): Find related articles, products, or media. For example, you can surface articles like the current one or visually similar product images.

Dense embeddings are a good choice when you need multilingual retrieval or a specific third-party embedding model you have already evaluated on your data.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd remove this sentence. Not all dense embeddings models are multilingual. We support Jina Embeddings v3, which is English only, for example.

Also, the need to use a model that's already been evaluated for a usecase could apply to a sparse embedding model too - it's a question of previous technical decisions and the need to accommodate them.

Could maybe say something like

Dense embeddings are ideal when you care more about the semantic meaning of search terms than exact keyword matches - they excel at retrieving relevant results based on synonyms and paraphrasing of the original query to return results that reflect the user's intensions.


Common use cases include:

- [Retrieval augmented generation (RAG)](../rag.md): Retrieve document passages that answer a user's question, even when the question and the source text use different words.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be an opportunity here to mention Context Engineering more prominently than RAG?

RAG is still a useful concept but I think we're positioning ourselves in the market as a broader solution for context engineering as a whole.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ there's some good definitions in this Anthropic blog post: https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents

this could be a good spot to link to agent builder for OOTB toolkit

Per the glossary definition:

"Agent Builder combines LLM reasoning with skills, tools, and best practices for context engineering and retrieval, so responses are accurately and efficiently grounded in your data."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants