Any explanation on how RAG works? #8

bilogic · 2026-06-11T15:32:04Z

bilogic
Jun 11, 2026

Hi!

I'm trying to figure out this RAG and can't connect a few things together, let me explain

The 3 key things I know of are

embeddings provider, seems you can use embeddinggemma locally
vector database, I think it is LanceDB
LLM, seems like you can pick any from opencode

It seems weird to me that an the 3 can work with embeddinggemma, I was on the impression that if an LLM doesn't have a matching embeddings provider, then it cannot be turned into a RAG, how does this actually work? Or did I misunderstand?
How does the remote LLM actually search the local vector database?

Thanks!

MrDoe · 2026-06-11T19:18:58Z

MrDoe
Jun 11, 2026
Maintainer

I get why this feels confusing - I had the same “who talks to whom?” moment when I first built a RAG pipeline. The simplest way I think about it: embeddings and the vector DB are the search engine, the LLM is the writer. You create vector representations of your documents (indexing) and store them in a DB like LanceDB. At query time you turn the user question into an embedding vector, use that to fetch the most relevant passages, then hand those passages (as plain text) to the LLM. The LLM never needs the vectors - it only needs the retrieved text to compose an answer.

That means an LLM doesn’t have to “use the same” embedding provider to be part of RAG. What does matter is that the index and the query use compatible embeddings so the DB returns the right passages. If you index with EmbeddingGemma and also compute query embeddings with EmbeddingGemma (same model, same preprocessing), retrieval quality will be much better. If you mix very different embedding models, nearest‑neighbor search degrades and you’ll get less relevant context to feed the LLM.

0 replies

MrDoe · 2026-06-11T19:33:08Z

MrDoe
Jun 11, 2026
Maintainer

To answer the second question: This is a typical RAG pipeline:

Prerequisite - Index all files

Split the files in chunks
Use embedding LLM to calculate the vector of each chunk
Store each chunk with vector and clear text in vector database.

Semantic Search

User prompt is sent to the embedding LLM.
-> Embedding vector is generated.
Find nearest neighbors of this vector in vector space to get the most relevant chunks.
Pass the chunks (as text) to the LLM.

0 replies

MrDoe · 2026-06-11T19:37:32Z

MrDoe
Jun 11, 2026
Maintainer

I've set up "Discussions" for questions in the repo now.

0 replies

bilogic · 2026-06-12T00:38:07Z

bilogic
Jun 12, 2026
Author

Thanks! Can this issue be converted to a discusssion, I have seen it happen on other repos.

For clarity, I would like to name these:

embeddings provider, seems you can use embeddinggemma locally
vector database, I think it is LanceDB
LLM, seems like you can pick any from opencode, let's use big pickle from opencode
Prerequisite - Index all files

...
2. Use embedding LLM to calculate the vector of each chunk
...

Semantic Search
1. User prompt is sent to the embedding LLM.
  ...

I see a new term embedding LLM in Prerequisite and Semantic Search
Do they both refer to embeddinggemma or one of the LLM in opencode, e.g. big pickle?
Are you able to point me to the code where the User prompt is sent to opencode in return for an embedding vector?

3 replies

MrDoe Jun 12, 2026
Maintainer

I would rather name it "Embedding Model" or "Embedding LLM". Embedding provider is Ollama or any OpenAI-compatible LLM hoster.
Of course you can use other embedding LLMs, embeddinggemma is just a proposal which is very lightweight. manutic/nomic-embed-code performs better for code retrievals, but then you'll need a GPU, because inference with CPU is too slow. See here for other embedding models: https://ollama.com/search?c=embedding
Vector database is LanceDB, because it is very small and quite performant
Yes, the "normal" LLM can be picked in OpenCode. I would prefer DeepSeek Flash or Mimo V2.5 Free over Big Pickle if you want to use free models.

For 3.:
The general flow is: user message → retrieve() → embedder.embed() → HTTP POST to embedding provider.

Key locations:

Hook entry: src/plugin.ts:556
The chat.message hook extracts user text and calls retrieve().
Retrieval + embedding: src/retriever/retriever.ts:19
const embeddings = await embedder.embed([query]);
This is where the user's query text is sent for embedding.
Send query text to embedding provider:

Ollama provider: src/embedder/ollama.ts:46 - POSTs to ${baseUrl}/embed with { model, input: texts }
OpenAI provider: src/embedder/openai.ts:24 - POSTs to ${baseUrl}/embeddings with { model, input: texts }

Vector search — src/retriever/retriever.ts:30
The returned embedding is passed to store.search(embedding, topK) against LanceDB.

bilogic Jun 15, 2026
Author

Semantic Search

User prompt is sent to the embedding LLM.
-> Embedding vector is generated.

Find nearest neighbors of this vector in vector space to get the most relevant chunks.

Pass the chunks (as text) to the LLM.

I'm still unsure when the LLM (big pickle) comes into the picture...

In step 3 do you mean to send to big pickle a prompt similar to "Please answer (user-prompt) using these pieces of relevant information (chunk1) (chunk2) (chunk3)"?

MrDoe Jun 15, 2026
Maintainer

There are several options and functions now on how embeddings and chunks are being used. I'll update the documentation to make it clearer, when I have the time.

Any explanation on how RAG works? #8

Uh oh!

bilogic Jun 11, 2026

Replies: 4 comments · 3 replies

Uh oh!

Uh oh!

MrDoe Jun 11, 2026 Maintainer

Uh oh!

Uh oh!

MrDoe Jun 11, 2026 Maintainer

Prerequisite - Index all files

Semantic Search

Uh oh!

MrDoe Jun 11, 2026 Maintainer

Uh oh!

Uh oh!

bilogic Jun 12, 2026 Author

Prerequisite - Index all files

Semantic Search

Uh oh!

MrDoe Jun 12, 2026 Maintainer

Uh oh!

bilogic Jun 15, 2026 Author

Semantic Search

Uh oh!

MrDoe Jun 15, 2026 Maintainer

bilogic
Jun 11, 2026

Replies: 4 comments 3 replies

MrDoe
Jun 11, 2026
Maintainer

MrDoe
Jun 11, 2026
Maintainer

MrDoe
Jun 11, 2026
Maintainer

bilogic
Jun 12, 2026
Author

MrDoe Jun 12, 2026
Maintainer

bilogic Jun 15, 2026
Author

MrDoe Jun 15, 2026
Maintainer