Chat with PDFs, CSVs, TXT files, websites, and YouTube videos — powered by RAG, pluggable LLMs, and an optional fully-offline mode.
- Features
- Architecture
- Requirements
- Run with Docker
- Installation (from source)
- Configuration
- Usage scenarios
- Source-type walkthroughs
- Persistence
- Language switching
- Troubleshooting
- Development
- Contributing
- Project structure
- Roadmap / out of scope
- License
- 5 source types: PDF, CSV, TXT, website URL, YouTube URL.
- 4 LLM providers: OpenAI, Anthropic, Groq, Ollama — swappable from the UI.
- 2 embedding providers: OpenAI (cloud) and HuggingFace (fully local, multilingual).
- Chroma vector store — ephemeral or persistent, single backend, zero config.
- Persistent or ephemeral index at the user's choice.
- Model selection persistence — provider, model, embeddings, and temperature are remembered across restarts.
- Conversational memory with history-aware query rewriting, so follow-up questions work.
- Multi-language UI: English + Portuguese ship out of the box. Adding a language is a single JSON file.
- 100% free & offline mode with Ollama + HuggingFace embeddings — no API keys, no network.
- Citations by source: every chunk carries its source; YouTube citations deep-link to the exact timestamp.
- Strict-typed Python (
pyrightstrict), linted byruff, gated by pre-commit.
flowchart LR
UI[Streamlit UI]
i18n[i18n/]
loaders[loaders/]
splitter[splitter]
embeddings[embeddings]
vs[vector store]
chain[RunnableWithMessageHistory]
LLM[LLM]
UI --> loaders
i18n --> loaders
loaders --> splitter --> embeddings --> vs
vs --> chain --> LLM
All RAG modules are pure, testable functions — the Streamlit app wires them together but contains no retrieval logic itself.
- Python 3.11+.
- uv recommended (falls back to
pipfine). - (Optional) Ollama for the fully-offline stack — install and
ollama serve. - (Optional) API keys for OpenAI / Anthropic / Groq. Fill only the ones you plan to use.
The fastest way to try PolyChat is with Docker — no Python, no uv, no dependency juggling. Indexes and the HuggingFace model cache persist in a named volume.
Pull the prebuilt image (recommended):
curl -O https://raw.githubusercontent.com/leodrivera/polychat/main/.env.example
mv .env.example .env # then edit to add the provider keys you want
docker run --rm -p 8501:8501 \
--env-file .env \
-v polychat-data:/data \
ghcr.io/leodrivera/polychat:latestDocker Compose — cloud providers (OpenAI / Anthropic / Groq):
git clone -b main https://github.com/leodrivera/polychat.git
cd polychat
cp .env.example .env # fill in at least one API key
docker compose up # add -d to run detachedDocker Compose — fully offline with Ollama (no API keys needed):
git clone -b main https://github.com/leodrivera/polychat.git
cd polychat
cp .env.example .env # HF_TOKEN recommended; no API keys required
docker compose -f docker-compose.ollama.yml up -d
# Pull a model (one-time):
docker compose -f docker-compose.ollama.yml exec ollama ollama pull llama3.1:8b
# In the sidebar: Provider = Ollama · Model = llama3.1:8b · Embeddings = HuggingFace (local)Open http://localhost:8501.
- Images are published on every release tag to
ghcr.io/leodrivera/polychat(tags:latest,<major>,<major>.<minor>,<major>.<minor>.<patch>). Multi-arch (linux/amd64,linux/arm64). - State lives in the
polychat-datavolume. Wipe withdocker compose down -v.
git clone -b main https://github.com/leodrivera/polychat.git
cd polychat
uv sync # or: python -m venv .venv && source .venv/bin/activate && pip install -e .
cp .env.example .env # fill in the keys you want to use
uv run streamlit run src/polychat/app.py
# or, equivalently, once the package is installed:
polychatOpen http://localhost:8501 and you're in.
All configuration lives in .env (see .env.example):
OPENAI_API_KEY=
ANTHROPIC_API_KEY=
GROQ_API_KEY=
OLLAMA_BASE_URL=http://localhost:11434
HF_TOKEN= # optional — removes HuggingFace rate-limit warning
YT_PROXY_HTTP= # optional — only needed if YouTube blocks your IP
YT_PROXY_HTTPS=Only the keys you actually use need values. The app validates each provider at Initialize RAG time and surfaces a localized error when a key is missing.
Each scenario is a self-contained recipe: env vars, sidebar selections, and what to expect.
- Env:
OPENAI_API_KEY. - Sidebar → Model Selection: Provider =
OpenAI· Model =gpt-4o· Embeddings =OpenAI (cloud). - Sidebar → File Upload: Persist = off.
- Notes: highest answer quality, cloud cost applies.
- Env:
GROQ_API_KEY,OPENAI_API_KEY. - Sidebar: Provider =
Groq· Model =llama-3.3-70b-versatile· Embeddings =OpenAI (cloud). - Notes: Groq is ~10× faster than OpenAI at similar quality for this tier. Works well for interactive chat on large corpora.
- Env:
GROQ_API_KEY(free tier is plenty). - Sidebar: Provider =
Groq· Model =llama-3.1-8b-instant· Embeddings =HuggingFace (local, multilingual). - Notes: embeddings run locally on CPU (~120 MB download on first use, cached under
~/.cache/huggingface). Zero OpenAI spend.
Prereqs:
# Install Ollama: https://ollama.com/download
ollama serve & # background
ollama pull llama3.1:8b # ~4.7 GB
# Smaller alternative:
# ollama pull phi3:mini # ~2.3 GB- Env: none required. Leave
.envempty. - Sidebar: Provider =
Ollama· Model =llama3.1:8b· Embeddings =HuggingFace (local, multilingual). - Verify offline: disconnect your network — everything still works. The first query warms the Ollama model (a few seconds on CPU, instant on GPU).
- Env:
ANTHROPIC_API_KEY. - Sidebar: Provider =
Anthropic· Model =claude-sonnet-4-6· Embeddings =OpenAI (cloud). - Notes: strong reasoning on technical PDFs; very large context windows.
Upload a product manual (e.g., manual.pdf) and ask What does a flashing red light mean? — the answer cites the PDF's filename.
Upload a small orders.csv; ask How many rows have country=BR? — each row becomes its own Document, so row-level questions land precisely.
Great for READMEs, chat logs, or policy files. Simple and fast.
Paste a full URL like https://en.wikipedia.org/wiki/Retrieval-augmented_generation. Caveats: sites behind Cloudflare, paywalls, or JavaScript-only rendering may return 403 or produce empty content. For those, save the page as HTML or copy the article into a .txt and use the TXT source.
Paste any of:
https://www.youtube.com/watch?v=<id>https://youtu.be/<id>https://www.youtube.com/shorts/<id>https://www.youtube.com/embed/<id>
The app fetches the transcript via youtube-transcript-api (v1.2.4+). Manually-created transcripts are preferred; auto-generated ones are used as fallback. The preferred language follows the UI locale (pt_br → tries pt-BR, pt, then en).
- Live streams and videos without transcripts are not supported.
- Shorts are supported.
- Each snippet's citation includes a deep link to the timestamp (e.g.,
&t=42s) — click and you land at the exact moment.
If your IP is blocked by YouTube (common on cloud VMs), set YT_PROXY_HTTP / YT_PROXY_HTTPS in .env.
- Enable Persist index on the sidebar before clicking Initialize RAG.
- This writes the vector store to
./chroma_db/(or/data/chroma_db/in Docker). - Restarting Streamlit will reload the persisted index automatically.
- To wipe, delete the
chroma_db/directory.
Model selection (provider, model, embeddings, temperature) is always persisted to ./polychat_prefs.json — no checkbox needed. In Docker it resolves to /data/polychat_prefs.json inside the volume.
A .fingerprint sidecar records which embedding model produced the index. Switching embedding providers raises errors.embeddings_changed until you re-initialize — you can't mix vectors from different embedders.
-
The selector lives in the top-right corner of the page (fixed position).
-
Shipped:
en(default) andpt_br(matching the original screenshots). -
Adding a new language (zero code):
cp src/polychat/i18n/en.json src/polychat/i18n/es.json # Edit the file: # "_meta": { "name": "Español", "code": "es" } # …then translate every other value.
Restart Streamlit — "Español" appears in the language selector automatically.
The UI language is independent from the answer language. The LLM answers in whatever language your question is written in, thanks to the QA prompt.
| Symptom | Fix |
|---|---|
Missing API key for OpenAI |
Set OPENAI_API_KEY in .env (same for other providers). |
This video does not have a transcript available. |
Try another video; captions must be enabled. Auto-generated subs are accepted. |
YouTube IpBlocked / RequestBlocked |
Set YT_PROXY_HTTP / YT_PROXY_HTTPS in .env. |
| Ollama "connection refused" | Start the daemon: ollama serve. Check OLLAMA_BASE_URL. |
| HuggingFace first download is slow | Expected. Models are cached under ~/.cache/huggingface; subsequent runs are instant. |
Embeddings changed — please re-initialize the RAG |
You switched embedding providers after indexing. Click Initialize RAG again. |
| Streamlit file upload rejected | Default limit is 200 MB per file. Split very large PDFs or raise server.maxUploadSize in .streamlit/config.toml. |
uv sync --group dev
uv run pre-commit install # wire up git hooks
# Run the whole gate locally (same as CI):
uv run pre-commit run --all-files
# Piece by piece:
uv run ruff check .
uv run ruff format --check .
uv run pyright
uv run pytest # fast unit suite, no network
uv run pytest -m integration # opt-in, hits APIs / Ollama
uv run streamlit run src/polychat/app.pyEvery commit runs ruff → pyright → unit tests → hygiene checks. CI runs the same gate on every push and pull request.
- Fork the repo and clone your fork.
- Branch from
dev.git checkout dev git checkout -b feat/my-feature
- Set up the dev environment (see Development above).
- Make your changes, keeping all pre-commit checks green.
- Open a PR targeting
dev.
polychat/
├── src/polychat/
│ ├── app.py # Streamlit entrypoint — UI wiring only
│ ├── cli.py # `polychat` console entrypoint
│ ├── i18n/ # t(), available_locales(), locale JSON files
│ ├── rag/
│ │ ├── loaders/ # PDF/CSV/TXT/site/YouTube loaders
│ │ ├── splitter.py
│ │ ├── embeddings.py # OpenAI + HuggingFace local
│ │ ├── vector_store.py # Chroma (only backend)
│ │ ├── prefs.py # model selection persistence (pure stdlib)
│ │ ├── llm.py # OpenAI / Anthropic / Groq / Ollama factory
│ │ └── chain.py # history-aware retrieval chain
│ └── prompts/qa.py
├── tests/
│ ├── conftest.py # DeterministicEmbeddings, YouTube fakes, tiny_locale_dir
│ ├── unit/ # fast, no-network suite (runs on every commit)
│ └── integration/ # opt-in; requires API keys / network
├── Dockerfile
├── docker-compose.yml # cloud providers (OpenAI / Anthropic / Groq)
├── docker-compose.ollama.yml # fully-offline stack with Ollama
├── .pre-commit-config.yaml
├── .github/workflows/ci.yml
├── .github/workflows/release.yml
├── .github/workflows/docker-publish.yml
├── pyproject.toml # deps + ruff + pyright + pytest config
└── .env.example
Explicitly not in v1:
- Multi-user auth or per-user indexes.
- OCR for scanned PDFs (assumes a text layer).
- Audio/video transcription (YouTube uses existing captions only).
- Re-ranking, hybrid BM25+vector, or query rewriting beyond the history-aware retriever.
- Auto-translating answers based on the UI language (answers follow the user's question language).
- An evaluation harness (
ragas,deepeval).
MIT.