Run vLLM on RunPod GPUs with Tailscale networking. No public ports exposed.
- Runs vLLM (OpenAI-compatible API) on cloud GPUs
- Connects to your Tailscale network automatically
- Zero public exposure - only accessible via Tailscale
[Your Machine] ──Tailscale VPN──> [RunPod: vllm-server:8000]
Public Internet ────✗────> [Pod] (no ports exposed)
- RunPod account + API key
- Tailscale account + auth key (reusable, ephemeral)
- Python 3.10+
git clone https://github.com/dmitryturcan/runpod-vllm-tailscale.git
cd runpod-vllm-tailscale
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env with your API keyspython -m podctl.cli create # Create pod
python -m podctl.cli wait # Wait for ready
python -m podctl.cli status # Check status
python -m podctl.cli stop # Stop (preserves volume)
python -m podctl.cli destroy # Delete completely
python -m podctl.cli templates # List available templatesOnce ready, access via Tailscale hostname:
curl http://qwen-coder:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2.5-Coder-32B-Instruct",
"messages": [{"role": "user", "content": "Hello"}]
}'| Template | Model | Context | GPU |
|---|---|---|---|
qwen-72b-h200 |
Qwen2.5-72B-Instruct-AWQ | 100K | H200 |
qwen-coder-32b |
Qwen2.5-Coder-32B-Instruct | 30K | A100 80GB |
qwen-coder-7b |
Qwen2.5-Coder-7B-Instruct | 32K | RTX 4090 |
deepseek-coder-33b |
DeepSeek-Coder-33B | 16K | A100 80GB |
devstral-24b |
Devstral-Small-2507 | 100K | A100 80GB |
Use: python -m podctl.cli create --template qwen-72b-h200
Pre-built: dmitryturcan/vllm-tailscale:latest
Or build your own:
cd docker
docker build --platform linux/amd64 -t your-registry/vllm-tailscale:latest .
docker push your-registry/vllm-tailscale:latestMIT