runpod-vllm-tailscale

Run vLLM on RunPod GPUs with Tailscale networking. No public ports exposed.

What It Does

Runs vLLM (OpenAI-compatible API) on cloud GPUs
Connects to your Tailscale network automatically
Zero public exposure - only accessible via Tailscale

[Your Machine] ──Tailscale VPN──> [RunPod: vllm-server:8000]

Public Internet ────✗────> [Pod]  (no ports exposed)

Quick Start

Prerequisites

RunPod account + API key
Tailscale account + auth key (reusable, ephemeral)
Python 3.10+

Setup

git clone https://github.com/dmitryturcan/runpod-vllm-tailscale.git
cd runpod-vllm-tailscale

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

cp .env.example .env
# Edit .env with your API keys

Usage

python -m podctl.cli create    # Create pod
python -m podctl.cli wait      # Wait for ready
python -m podctl.cli status    # Check status
python -m podctl.cli stop      # Stop (preserves volume)
python -m podctl.cli destroy   # Delete completely
python -m podctl.cli templates # List available templates

Connect

Once ready, access via Tailscale hostname:

curl http://qwen-coder:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen2.5-Coder-32B-Instruct",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Templates

Template	Model	Context	GPU
`qwen-72b-h200`	Qwen2.5-72B-Instruct-AWQ	100K	H200
`qwen-coder-32b`	Qwen2.5-Coder-32B-Instruct	30K	A100 80GB
`qwen-coder-7b`	Qwen2.5-Coder-7B-Instruct	32K	RTX 4090
`deepseek-coder-33b`	DeepSeek-Coder-33B	16K	A100 80GB
`devstral-24b`	Devstral-Small-2507	100K	A100 80GB

Use: python -m podctl.cli create --template qwen-72b-h200

Docker Image

Pre-built: dmitryturcan/vllm-tailscale:latest

Or build your own:

cd docker
docker build --platform linux/amd64 -t your-registry/vllm-tailscale:latest .
docker push your-registry/vllm-tailscale:latest

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
docker		docker
podctl		podctl
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
gpu-retry-fast.sh		gpu-retry-fast.sh
gpu-retry.sh		gpu-retry.sh
podctl.py		podctl.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

runpod-vllm-tailscale

What It Does

Quick Start

Prerequisites

Setup

Usage

Connect

Templates

Docker Image

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

runpod-vllm-tailscale

What It Does

Quick Start

Prerequisites

Setup

Usage

Connect

Templates

Docker Image

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages