GitHub - credo92/vllm-model-distillation: App for Model Distillation and to understand what it's about

vLLM Distillation Playground

This repo contains:

Backend: FastAPI + SQLAlchemy + Postgres, OSS teacher + OSS student vLLM clients
Frontend: React + Vite + Tailwind, auth, datasets, and a distillation playground
Infra: Docker Compose for Postgres and vLLM, plus simple Makefile helpers

What is knowledge distillation?

Goal: Train a smaller, cheaper student model to mimic a larger, more capable teacher model.
Why:
- Run models with lower latency and cost in production.
- Deploy on smaller GPUs or CPUs while retaining most of the teacher’s quality.
How (high level):
- Send prompts to the teacher model and capture its responses.
- Optionally compare teacher vs. student responses for the same prompt.
- Use the collected (prompt, teacher_output) (and possibly student_output) pairs as a supervised training dataset for the student model.

In this repo, the playground focuses on the data collection side of distillation: creating prompt datasets and logging teacher/student responses in a structured way that can be exported for training.

How this playground does distillation

At a high level:

Create a project (dataset) from the UI.
Enter prompts in the playground.
For each prompt, the backend:

Calls the teacher OSS model (e.g., a larger Mistral/LLaMA variant).
Calls the student OSS model served by vLLM via an OpenAI-compatible API.

The backend stores the prompt + both responses in Postgres.
You can replay prompts, iterate on them, and export all data for offline training.

This gives you a repeatable loop:

Design prompts → collect teacher/student data.
Export dataset → train or fine-tune the student.
Update the student model behind vLLM → repeat and compare.

System architecture overview

At a component level:

Frontend (React + Vite + Tailwind):
- Auth flows (/register, /login).
- Dataset (project) list and management.
- Distillation playground UI for running teacher vs. student side by side.
Backend (FastAPI + SQLAlchemy + Postgres):
- Auth endpoints (/auth/*).
- Project and prompt management.
- Integrations with:
  - An OSS teacher model client.
  - An OSS student model client talking to a vLLM OpenAI-compatible server.
Database (Postgres):
- Persists users, projects, prompts, teacher outputs, and student outputs.
Model serving (vLLM):
- Serves the student OSS model via an OpenAI-compatible HTTP API.

High-level architecture diagram

flowchart LR
    subgraph User
        B[Browser<br/>React + Vite UI]
    end

    subgraph Backend[FastAPI Backend]
        A1[Auth & Users]
        A2[Projects & Datasets]
        A3[Playground API<br/>Prompts & Runs]
        A4[Teacher Client]
        A5[Student Client<br/>vLLM/OpenAI]
    end

    subgraph DB[Postgres]
        D1[(Users)]
        D2[(Projects)]
        D3[(Prompts & Runs)]
    end

    subgraph Models
        T[Teacher OSS Model]
        S[vLLM Server<br/>Student OSS Model]
    end

    B <--> A1
    B <--> A2
    B <--> A3

    A1 <--> D1
    A2 <--> D2
    A3 <--> D3

    A4 --> T
    A5 --> S

    A3 --> A4
    A3 --> A5

Distillation data flow

sequenceDiagram
    participant U as User (Browser)
    participant FE as Frontend
    participant BE as Backend (FastAPI)
    participant T as Teacher Model
    participant S as vLLM Student
    participant DB as Postgres

    U->>FE: Enter prompt in playground
    FE->>BE: POST /projects/{id}/prompts/run
    BE->>T: Generate teacher response
    T-->>BE: Teacher output
    BE->>S: OpenAI-compatible /chat/completions
    S-->>BE: Student output
    BE->>DB: Store {project, prompt, teacher, student}
    BE-->>FE: Return both responses
    FE-->>U: Render side-by-side outputs

You can then export the data from the backend and feed it into your own training pipelines (PyTorch, Hugging Face, etc.).

Directory layout

backend: FastAPI app (backend.main:app), models, schemas, and API routes
frontend: React/Vite SPA in frontend/src
docker-compose.yml: Local Postgres and vLLM services
Makefile: Convenience commands for running services and apps

Prerequisites

Python: 3.10+ (for the backend)
Node.js: 18+ and npm (for the frontend)
Docker + Docker Compose (for Postgres and vLLM)
GPU + recent NVIDIA drivers (recommended for the vLLM container)

Backend setup (FastAPI)

1. Create a virtualenv and install dependencies

cd backend
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\Scripts\activate

pip install -e .

This uses pyproject.toml to install FastAPI, SQLAlchemy, psycopg, etc.

2. Configure environment

Copy the example env file and adjust values as needed:

cd backend
cp .env.example .env

Key fields:

DATABASE_URL: Defaults to postgresql+psycopg://postgres:postgres@localhost:5432/oss_distiller
TEACHER_OSS_MODEL_NAME: Larger OSS model used as the teacher
OSS_MODEL_NAME and VLLM_BASE_URL: Student OSS model and OpenAI-compatible server URL
JWT_SECRET_KEY: Change this in non-dev environments

3. Run the backend

From the repo root (or backend directory):

cd backend
uvicorn backend.main:app --reload --host 0.0.0.0 --port 8000

The backend will:

Connect to Postgres using DATABASE_URL
Initialize tables on startup
Expose OpenAPI docs at http://localhost:8000/docs

Frontend setup (React + Vite)

1. Configure API base URL

From the frontend directory:

cd frontend
cp .env.example .env

The default points to the local backend:

VITE_API_BASE_URL: http://localhost:8000

2. Install dependencies and run dev server

cd frontend
npm install
npm run dev

The app will start at http://localhost:5173 and talk to the backend at http://localhost:8000.

Running Postgres and vLLM via Docker

1. Start services

From the repo root:

make services-up

This is equivalent to:

docker compose up -d db vllm

Services:

db:
- Image: postgres:16
- Port: 5432 exposed on the host
- Default credentials match DATABASE_URL in backend/.env.example
vllm:
- Image: vllm/vllm-openai:latest
- Command: --model mistralai/Mistral-7B-Instruct-v0.2
- Port: 8001 on host (mapped to container 8000)
- Backend expects VLLM_BASE_URL="http://localhost:8001"

Note: The vLLM container expects a GPU and recent NVIDIA drivers. If you do not have a GPU, you can:

Comment out the vllm service in docker-compose.yml, and

Point VLLM_BASE_URL in backend/.env to any other OpenAI-compatible endpoint you control.

2. Stop services

make services-down

or directly:

docker compose down

One-liner workflow

From the repo root, in three terminals:

Start infra (Postgres + vLLM):

 make services-up

Run backend:

 make backend

Run frontend:

 make frontend

Then open http://localhost:5173 in your browser.

Authentication & playground flow

Register and log in via the frontend (/register and /login), which hits:
- POST /auth/register
- POST /auth/login
- GET /auth/me
Create and use datasets (projects) from the UI:
- / lists datasets (projects) and lets you:
  - Select them for training
  - Open the playground
  - Download an export as JSON
/projects/:projectId is the distillation playground:
- Enter a prompt, then run an OSS teacher model and OSS vLLM student model side by side
  - Each run is persisted as a prompt plus two model responses
  - You can re-run previous prompts and download the full dataset for that project

Backend dataset exports are served from:

GET /prompts/export/project/{project_id}

The frontend’s dataset download buttons call this endpoint and save a nicely formatted JSON file for training.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
nginx.conf		nginx.conf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vLLM Distillation Playground

What is knowledge distillation?

How this playground does distillation

System architecture overview

High-level architecture diagram

Distillation data flow

Directory layout

Prerequisites

Backend setup (FastAPI)

1. Create a virtualenv and install dependencies

2. Configure environment

3. Run the backend

Frontend setup (React + Vite)

1. Configure API base URL

2. Install dependencies and run dev server

Running Postgres and vLLM via Docker

1. Start services

2. Stop services

One-liner workflow

Authentication & playground flow

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vLLM Distillation Playground

What is knowledge distillation?

How this playground does distillation

System architecture overview

High-level architecture diagram

Distillation data flow

Directory layout

Prerequisites

Backend setup (FastAPI)

1. Create a virtualenv and install dependencies

2. Configure environment

3. Run the backend

Frontend setup (React + Vite)

1. Configure API base URL

2. Install dependencies and run dev server

Running Postgres and vLLM via Docker

1. Start services

2. Stop services

One-liner workflow

Authentication & playground flow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages