Skip to content

dmitryturcan/runpod-vllm-tailscale

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

runpod-vllm-tailscale

Run vLLM on RunPod GPUs with Tailscale networking. No public ports exposed.

What It Does

  • Runs vLLM (OpenAI-compatible API) on cloud GPUs
  • Connects to your Tailscale network automatically
  • Zero public exposure - only accessible via Tailscale
[Your Machine] ──Tailscale VPN──> [RunPod: vllm-server:8000]

Public Internet ────✗────> [Pod]  (no ports exposed)

Quick Start

Prerequisites

  1. RunPod account + API key
  2. Tailscale account + auth key (reusable, ephemeral)
  3. Python 3.10+

Setup

git clone https://github.com/dmitryturcan/runpod-vllm-tailscale.git
cd runpod-vllm-tailscale

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

cp .env.example .env
# Edit .env with your API keys

Usage

python -m podctl.cli create    # Create pod
python -m podctl.cli wait      # Wait for ready
python -m podctl.cli status    # Check status
python -m podctl.cli stop      # Stop (preserves volume)
python -m podctl.cli destroy   # Delete completely
python -m podctl.cli templates # List available templates

Connect

Once ready, access via Tailscale hostname:

curl http://qwen-coder:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen2.5-Coder-32B-Instruct",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Templates

Template Model Context GPU
qwen-72b-h200 Qwen2.5-72B-Instruct-AWQ 100K H200
qwen-coder-32b Qwen2.5-Coder-32B-Instruct 30K A100 80GB
qwen-coder-7b Qwen2.5-Coder-7B-Instruct 32K RTX 4090
deepseek-coder-33b DeepSeek-Coder-33B 16K A100 80GB
devstral-24b Devstral-Small-2507 100K A100 80GB

Use: python -m podctl.cli create --template qwen-72b-h200

Docker Image

Pre-built: dmitryturcan/vllm-tailscale:latest

Or build your own:

cd docker
docker build --platform linux/amd64 -t your-registry/vllm-tailscale:latest .
docker push your-registry/vllm-tailscale:latest

License

MIT

About

Secure, private LLM inference on RunPod GPUs with Tailscale networking. Zero public exposure.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors