This repo contains:
- Backend: FastAPI + SQLAlchemy + Postgres, OSS teacher + OSS student vLLM clients
- Frontend: React + Vite + Tailwind, auth, datasets, and a distillation playground
- Infra: Docker Compose for Postgres and vLLM, plus simple
Makefilehelpers
- Goal: Train a smaller, cheaper student model to mimic a larger, more capable teacher model.
- Why:
- Run models with lower latency and cost in production.
- Deploy on smaller GPUs or CPUs while retaining most of the teacher’s quality.
- How (high level):
- Send prompts to the teacher model and capture its responses.
- Optionally compare teacher vs. student responses for the same prompt.
- Use the collected
(prompt, teacher_output)(and possiblystudent_output) pairs as a supervised training dataset for the student model.
In this repo, the playground focuses on the data collection side of distillation: creating prompt datasets and logging teacher/student responses in a structured way that can be exported for training.
At a high level:
- Create a project (dataset) from the UI.
- Enter prompts in the playground.
- For each prompt, the backend:
- Calls the teacher OSS model (e.g., a larger Mistral/LLaMA variant).
- Calls the student OSS model served by vLLM via an OpenAI-compatible API.
- The backend stores the prompt + both responses in Postgres.
- You can replay prompts, iterate on them, and export all data for offline training.
This gives you a repeatable loop:
- Design prompts → collect teacher/student data.
- Export dataset → train or fine-tune the student.
- Update the student model behind vLLM → repeat and compare.
At a component level:
- Frontend (React + Vite + Tailwind):
- Auth flows (
/register,/login). - Dataset (project) list and management.
- Distillation playground UI for running teacher vs. student side by side.
- Auth flows (
- Backend (FastAPI + SQLAlchemy + Postgres):
- Auth endpoints (
/auth/*). - Project and prompt management.
- Integrations with:
- An OSS teacher model client.
- An OSS student model client talking to a vLLM OpenAI-compatible server.
- Auth endpoints (
- Database (Postgres):
- Persists users, projects, prompts, teacher outputs, and student outputs.
- Model serving (vLLM):
- Serves the student OSS model via an OpenAI-compatible HTTP API.
flowchart LR
subgraph User
B[Browser<br/>React + Vite UI]
end
subgraph Backend[FastAPI Backend]
A1[Auth & Users]
A2[Projects & Datasets]
A3[Playground API<br/>Prompts & Runs]
A4[Teacher Client]
A5[Student Client<br/>vLLM/OpenAI]
end
subgraph DB[Postgres]
D1[(Users)]
D2[(Projects)]
D3[(Prompts & Runs)]
end
subgraph Models
T[Teacher OSS Model]
S[vLLM Server<br/>Student OSS Model]
end
B <--> A1
B <--> A2
B <--> A3
A1 <--> D1
A2 <--> D2
A3 <--> D3
A4 --> T
A5 --> S
A3 --> A4
A3 --> A5
sequenceDiagram
participant U as User (Browser)
participant FE as Frontend
participant BE as Backend (FastAPI)
participant T as Teacher Model
participant S as vLLM Student
participant DB as Postgres
U->>FE: Enter prompt in playground
FE->>BE: POST /projects/{id}/prompts/run
BE->>T: Generate teacher response
T-->>BE: Teacher output
BE->>S: OpenAI-compatible /chat/completions
S-->>BE: Student output
BE->>DB: Store {project, prompt, teacher, student}
BE-->>FE: Return both responses
FE-->>U: Render side-by-side outputs
You can then export the data from the backend and feed it into your own training pipelines (PyTorch, Hugging Face, etc.).
- backend: FastAPI app (
backend.main:app), models, schemas, and API routes - frontend: React/Vite SPA in
frontend/src - docker-compose.yml: Local Postgres and vLLM services
- Makefile: Convenience commands for running services and apps
- Python: 3.10+ (for the backend)
- Node.js: 18+ and npm (for the frontend)
- Docker + Docker Compose (for Postgres and vLLM)
- GPU + recent NVIDIA drivers (recommended for the vLLM container)
cd backend
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e .This uses pyproject.toml to install FastAPI, SQLAlchemy, psycopg, etc.
Copy the example env file and adjust values as needed:
cd backend
cp .env.example .envKey fields:
- DATABASE_URL: Defaults to
postgresql+psycopg://postgres:postgres@localhost:5432/oss_distiller - TEACHER_OSS_MODEL_NAME: Larger OSS model used as the teacher
- OSS_MODEL_NAME and VLLM_BASE_URL: Student OSS model and OpenAI-compatible server URL
- JWT_SECRET_KEY: Change this in non-dev environments
From the repo root (or backend directory):
cd backend
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000The backend will:
- Connect to Postgres using
DATABASE_URL - Initialize tables on startup
- Expose OpenAPI docs at
http://localhost:8000/docs
From the frontend directory:
cd frontend
cp .env.example .envThe default points to the local backend:
- VITE_API_BASE_URL:
http://localhost:8000
cd frontend
npm install
npm run devThe app will start at http://localhost:5173 and talk to the backend at http://localhost:8000.
From the repo root:
make services-upThis is equivalent to:
docker compose up -d db vllmServices:
- db:
- Image:
postgres:16 - Port:
5432exposed on the host - Default credentials match
DATABASE_URLinbackend/.env.example
- Image:
- vllm:
- Image:
vllm/vllm-openai:latest - Command:
--model mistralai/Mistral-7B-Instruct-v0.2 - Port:
8001on host (mapped to container8000) - Backend expects
VLLM_BASE_URL="http://localhost:8001"
- Image:
Note: The vLLM container expects a GPU and recent NVIDIA drivers. If you do not have a GPU, you can:
- Comment out the
vllmservice indocker-compose.yml, and- Point
VLLM_BASE_URLinbackend/.envto any other OpenAI-compatible endpoint you control.
make services-downor directly:
docker compose downFrom the repo root, in three terminals:
- Start infra (Postgres + vLLM):
make services-up- Run backend:
make backend- Run frontend:
make frontendThen open http://localhost:5173 in your browser.
- Register and log in via the frontend (
/registerand/login), which hits:POST /auth/registerPOST /auth/loginGET /auth/me
- Create and use datasets (projects) from the UI:
/lists datasets (projects) and lets you:- Select them for training
- Open the playground
- Download an export as JSON
/projects/:projectIdis the distillation playground:- Enter a prompt, then run an OSS teacher model and OSS vLLM student model side by side
- Each run is persisted as a prompt plus two model responses
- You can re-run previous prompts and download the full dataset for that project
- Enter a prompt, then run an OSS teacher model and OSS vLLM student model side by side
Backend dataset exports are served from:
GET /prompts/export/project/{project_id}
The frontend’s dataset download buttons call this endpoint and save a nicely formatted JSON file for training.