Try the future of RAG - multimodal RAG - that effortlessly brings PDFs, images, and text together in one index: ingest โ retrieve โ rerank โ (optionally) answer with sources.
Traditional RAG is great when everything is plain text. Real documents arenโt.
PDFs often contain:
- ๐ charts and tables
- ๐งพ scanned pages / images of text
- ๐งฉ figures with crucial labels
- ๐ง context split across captions + visuals
Multimodal RAG lets you retrieve from both:
- ๐ plain text (searchable chunks)
- ๐ผ๏ธ rendered page images / standalone images (visual understanding)
That means you can ask questions like:
- โWhich page contains the table with latency numbers?โ
- โWhere is the screenshot that shows the error dialog?โ โฆand get results that actually reflect the document as it exists, not just what text extraction managed to capture.
You can even search the index using example images, not just with text queries!
This demo uses Apache-2.0 Qwen3-VL models designed specifically for vision-language embedding and reranking:
-
๐งฒ Qwen3-VL Embedding: turns text + images into vectors in the same semantic space
โ You can retrieve โthe right page imageโ even if the answer isnโt explicitly in extracted text. -
๐ Qwen3-VL Reranker (optional): reorders the initial candidates with a stronger cross-encoder
โ Better top results, fewer โalmost relevantโ hits.
- ๐ฅ Ingest PDFs (page images + extracted text), images, and text files
- ๐งฒ Multimodal embeddings with Qwen3-VL
- ๐ Multimodal reranker support (toggleable) for improved ordering
- ๐๐ผ๏ธ Text or image queries (or both)
- โ๏ธ Optional answer generation with source context:
- fully local via
transformers, or - via an OpenAI-compatible API (local or remote)
- fully local via
- ๐พ Disk-backed index for quick restarts
- ๐ Web UI at
/for ingest + query
Open the UI after launch: http://localhost:8000/
uv run qwen3vl_multimodal_rag_server.pyuv run qwen3vl_multimodal_rag_server.py --enable-generatoruv run qwen3vl_multimodal_rag_server.py \
--enable-generator \
--generator-backend openai \
--generator-remote-endpoint http://localhost:1234/v1Tip: set OPENAI_API_KEY in a .env file. Many local servers accept any value.
uv run qwen3vl_multimodal_rag_server.py \
--enable-generator \
--generator-backend openai \
--generator-remote-model gpt-5-mini \
--generator-remote-reasoning-effort minimalCreate a .env with OPENAI_API_KEY=... before starting.
- ๐ฅ Drop a few PDFs / images
- โ Ask a question
- ๐ See retrieved sources (text chunks + page images)
- โ๏ธ Turn on generation when you want an answer with citations
All available CLI parameters (defaults shown):
| Flag | Default | Description |
|---|---|---|
--host |
0.0.0.0 |
Bind address for the server. |
--port |
8000 |
Port for the server. |
--log-level |
info |
Logging level (debug, info, warning, error). |
--data-dir |
./qwen3vl_rag_data |
Storage directory for the index, originals, and derived assets. |
--device |
auto |
Device selection (auto, cuda, mps, cpu). |
--embedder-model |
Qwen/Qwen3-VL-Embedding-2B |
Embedding model name. |
--embedder-quant |
off |
Embedder quantization (off, 8bit, 4bit). |
--reranker-model |
Qwen/Qwen3-VL-Reranker-2B |
Reranker model name. |
--reranker-quant |
off |
Reranker quantization (off, 8bit, 4bit). |
--disable-reranker |
false |
Disable reranking. |
--generator-model |
Qwen/Qwen3-VL-2B-Instruct |
Local generator model name. |
--generator-quant |
off |
Generator quantization (off, 8bit, 4bit). |
--enable-generator |
false |
Enable answer generation. |
--generator-backend |
transformers |
Generator backend (transformers, openai). |
--generator-remote-endpoint |
None |
Base URL for OpenAI-compatible APIs. |
--generator-remote-model |
None |
Model name for OpenAI-compatible APIs. |
--generator-remote-reasoning-effort |
medium |
Reasoning effort for remote generator (none, minimal, low, medium, high, xhigh). |
--text-chunk-size |
1200 |
Text chunk size during ingestion. |
--text-chunk-overlap |
200 |
Overlap between text chunks. |
--pdf-dpi |
75 |
DPI for rendering PDF pages (also used to downscale images). |
--pdf-max-pages |
0 |
If >0, only ingest the first N pages of a PDF. |
.envis loaded automatically at startup (forOPENAI_API_KEYor other env vars).- Retrieval defaults:
top_k=4,rerank_k=4(can be overridden per query in the API). - The UI exposes
/ingestand/queryand shows metadata from/health.