Replies: 4 comments 3 replies
-
|
I get why this feels confusing - I had the same “who talks to whom?” moment when I first built a RAG pipeline. The simplest way I think about it: embeddings and the vector DB are the search engine, the LLM is the writer. You create vector representations of your documents (indexing) and store them in a DB like LanceDB. At query time you turn the user question into an embedding vector, use that to fetch the most relevant passages, then hand those passages (as plain text) to the LLM. The LLM never needs the vectors - it only needs the retrieved text to compose an answer. That means an LLM doesn’t have to “use the same” embedding provider to be part of RAG. What does matter is that the index and the query use compatible embeddings so the DB returns the right passages. If you index with EmbeddingGemma and also compute query embeddings with EmbeddingGemma (same model, same preprocessing), retrieval quality will be much better. If you mix very different embedding models, nearest‑neighbor search degrades and you’ll get less relevant context to feed the LLM. |
Beta Was this translation helpful? Give feedback.
-
|
To answer the second question: This is a typical RAG pipeline: Prerequisite - Index all files
Semantic Search
|
Beta Was this translation helpful? Give feedback.
-
|
I've set up "Discussions" for questions in the repo now. |
Beta Was this translation helpful? Give feedback.
-
|
Thanks! Can this issue be converted to a discusssion, I have seen it happen on other repos. For clarity, I would like to name these:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi!
I'm trying to figure out this RAG and can't connect a few things together, let me explain
The 3 key things I know of are
embeddinggemmalocallyThanks!
Beta Was this translation helpful? Give feedback.
All reactions