From b724b7516f0884e3af236994c01e2eae08c43ba4 Mon Sep 17 00:00:00 2001 From: Kevin Wang Date: Thu, 19 Mar 2026 03:51:44 +0000 Subject: [PATCH 1/2] docs(gateway): improve cluster deployment guide and refactor deploy scripts Documentation improvements based on hands-on 2-node cluster deployment: - Add CVM deployment section (section 3) with env vars, deploy steps, and port planning for single-host multi-node setups - Add app ingress and port routing section (section 4) explaining SNI format - Add test app deployment section (section 5) with end-to-end verification - Add resource sizing and networking modes overview - Fix subnet scheme to use SUBNET_INDEX 0-based allocation matching deploy script behavior - Add SUBNET_INDEX mapping table and cluster sync/peer discovery docs - Add image version note about sync enable logic differences - Fix gateway-1 config: first node should have empty bootnode - Expand verification section with health check criteria - Add Cloudflare DNS:Edit permission note Script changes: - Extract bootstrap logic from deploy-to-vmm.sh into standalone bootstrap-cluster.sh for independent admin configuration - Simplify deploy-to-vmm.sh to focus on CVM deployment only --- gateway/docs/cluster-deployment.md | 511 +++++++++++++++++++++--- gateway/dstack-app/bootstrap-cluster.sh | 90 +++++ gateway/dstack-app/deploy-to-vmm.sh | 158 +------- 3 files changed, 569 insertions(+), 190 deletions(-) create mode 100755 gateway/dstack-app/bootstrap-cluster.sh diff --git a/gateway/docs/cluster-deployment.md b/gateway/docs/cluster-deployment.md index 8fe0d575..b62a5e42 100644 --- a/gateway/docs/cluster-deployment.md +++ b/gateway/docs/cluster-deployment.md @@ -6,8 +6,10 @@ This document describes how to deploy a dstack-gateway cluster, including single 1. [Overview](#1-overview) 2. [Cluster Deployment (2-Node Example)](#2-cluster-deployment-2-node-example) -3. [Adding Reverse Proxy Domains](#3-adding-reverse-proxy-domains) -4. [Operations and Monitoring](#4-operations-and-monitoring) +3. [CVM Deployment via dstack-vmm](#3-cvm-deployment-via-dstack-vmm) +4. [App Ingress and Port Routing](#4-app-ingress-and-port-routing) +5. [Deploying a Test App](#5-deploying-a-test-app) +6. [Adding Reverse Proxy Domains](#6-adding-reverse-proxy-domains) ## 1. Overview @@ -69,42 +71,102 @@ enabled = true address = "unix:/run/dstack/admin.sock" # Use Unix Domain Socket ``` +### Resource Sizing + +Recommended minimum per gateway node: + +| Workload | vCPU | Memory | Disk | +|----------|------|--------|------| +| Small (< 100 CVMs) | 4 | 4 GB | 20 GB | +| Medium (100-1000 CVMs) | 8 | 8 GB | 40 GB | +| Large (> 1000 CVMs) | 16+ | 16+ GB | 40+ GB | + +### Networking Modes + +dstack CVM supports two networking modes: + +| Mode | Description | Port Mapping | Use Case | +|------|-------------|--------------|----------| +| `user` (default) | QEMU user-mode networking with explicit host port forwarding | Required — each service port must be mapped to a host port | Standard deployments; simple setup | +| `bridge` | CVM gets its own IP on the host bridge network | Not needed — CVM is directly addressable by its bridge IP | High-performance scenarios requiring full network throughput | + +In **user mode**, the CVM accesses the external network via QEMU's built-in NAT. Each service port (RPC, WireGuard, proxy, admin) is individually forwarded from a host port to the corresponding guest port. This is the default and works with any VMM configuration. + +In **bridge mode**, the CVM is attached to the host's bridge interface (e.g., `dstack-br0`) and receives its own IP address via DHCP or static assignment. All ports are directly accessible on that IP without port mapping. This avoids the overhead of QEMU user-mode NAT and is recommended for production deployments that need maximum network performance. + +To use bridge mode, set `NET_MODE=bridge` in the `.env` file. The VMM must have a bridge interface configured in `vmm.toml`: + +```toml +[cvm.networking] +mode = "bridge" # or keep "user" as default; bridge is selected per-VM via deploy script +bridge = "dstack-br0" +``` + ## 2. Cluster Deployment (2-Node Example) ### 2.1 Node Planning | Node | node_id | Gateway IP | Client IP range | bootnode | |------|---------|------------|-----------------|----------| -| gateway-1 | 1 | 10.8.128.1/16 | 10.8.128.0/18 | gateway-2 | -| gateway-2 | 2 | 10.8.0.1/16 | 10.8.0.0/18 | gateway-1 | +| gateway-1 | 1 | 10.8.0.1/16 | 10.8.0.0/18 | (none or gateway-2) | +| gateway-2 | 2 | 10.8.64.1/16 | 10.8.64.0/18 | gateway-1 | Notes: -- Each node's node_id must be unique -- Each node's Client IP range should not overlap (used for allocating IPs to different CVMs) -- bootnode is configured as another node's RPC URL, used for cluster discovery at startup +- Each node's `node_id` must be unique and greater than 0 +- Each node's Client IP range must not overlap (used for allocating IPs to different CVMs) +- `bootnode` is optional — it speeds up initial peer discovery but is not required. Without a bootnode, a node will auto-discover peers when they connect via sync RPC +- If a bootnode is set, its hostname must be resolvable before cluster bootstrap ### 2.2 CIDR Description Client IP range (/18): - /18 means the first 18 bits are the network prefix -- For example, 10.8.128.0/18 covers the address range 10.8.128.0 ~ 10.8.191.255 +- For example, 10.8.0.0/18 covers the address range 10.8.0.0 ~ 10.8.63.255 - Each Gateway's /18 range does not overlap, so each Gateway can allocate IPs locally without syncing with other Gateways +- With 4 possible /18 ranges in a /16 network, a cluster supports up to 4 gateway nodes Gateway IP (/16): - Gateway IP uses /16 netmask to allow network routing to cover the larger 10.8.0.0/16 address space - This way, when another Gateway allocates an address in a /18 subnet, traffic can still be correctly routed +Subnet mapping: + +| SUBNET_INDEX | Gateway IP | Client IP range | Address range | Usable IPs | +|-------------|------------|-----------------|---------------|------------| +| 0 | 10.8.0.1/16 | 10.8.0.0/18 | 10.8.0.0 ~ 10.8.63.255 | 16,382 | +| 1 | 10.8.64.1/16 | 10.8.64.0/18 | 10.8.64.0 ~ 10.8.127.255 | 16,382 | +| 2 | 10.8.128.1/16 | 10.8.128.0/18 | 10.8.128.0 ~ 10.8.191.255 | 16,382 | +| 3 | 10.8.192.1/16 | 10.8.192.0/18 | 10.8.192.0 ~ 10.8.255.255 | 16,382 | + ### 2.3 WireGuard Configuration Fields Key fields in the `[core.wg]` section: -- `ip`: Gateway's own WireGuard address in CIDR format (e.g., 10.8.128.1/16) -- `client_ip_range`: Address pool range for allocating to CVMs (e.g., 10.8.128.0/18) -- `reserved_net`: Reserved address range that will not be allocated to CVMs (e.g., 10.8.128.1/32, reserving the gateway's own address) +- `ip`: Gateway's own WireGuard address in CIDR format (e.g., 10.8.0.1/16) +- `client_ip_range`: Address pool range for allocating to CVMs (e.g., 10.8.0.0/18) +- `reserved_net`: Reserved address range that will not be allocated to CVMs (e.g., 10.8.0.1/32, reserving the gateway's own address) Recommendation: Design client_ip_range and reserved_net to ensure clear address pool planning for each Gateway, avoiding address conflicts. -### 2.4 Configuration File Examples +### 2.4 Cluster Sync and Peer Discovery + +> **Image version note**: The sync enable logic varies by gateway image version. In `dstacktee/dstack-gateway:0.5.8`, sync is enabled when `NODE_ID > 0` (regardless of `BOOTNODE_URL`). In some custom-built images, sync may only be enabled when `BOOTNODE_URL` is non-empty. Check your image's `entrypoint.sh` to confirm the behavior. When in doubt, set `NODE_ID > 0` and provide a `BOOTNODE_URL` on at least one node. + +Gateway nodes discover each other through two mechanisms: + +1. **Bootnode discovery** (active): A node with `bootnode` configured will fetch the peer list from the bootnode at startup, then periodically retry until peers are found. + +2. **Auto-discovery** (passive): When a remote node sends a sync request, the local node automatically adds it as a peer. This means the first node in a cluster does not need a bootnode — it will be discovered when the second node connects to it. + +This allows a simple deployment order: +1. Start gateway-1 with `bootnode = ""` (no bootnode) +2. Start gateway-2 with `bootnode = "https://rpc.gateway-1:9012"` +3. Gateway-2 fetches peers from gateway-1 and starts syncing +4. Gateway-1 auto-discovers gateway-2 from the incoming sync request + +> Note: `bootnode` is only used for initial discovery. Once peers are discovered, they are persisted in the KV store and survive restarts. + +### 2.5 Configuration File Examples gateway-1.toml: @@ -141,7 +203,7 @@ enabled = true interval = "30s" timeout = "60s" my_url = "https://rpc.gateway-1.demo.dstack.org:9012" -bootnode = "https://rpc.gateway-2.demo.dstack.org:9012" +bootnode = "" node_id = 1 data_dir = "/var/lib/gateway/data" @@ -149,9 +211,9 @@ data_dir = "/var/lib/gateway/data" private_key = "" public_key = "" listen_port = 9013 -ip = "10.8.128.1/16" -reserved_net = ["10.8.128.1/32"] -client_ip_range = "10.8.128.0/18" +ip = "10.8.0.1/16" +reserved_net = ["10.8.0.1/32"] +client_ip_range = "10.8.0.0/18" config_path = "/var/lib/gateway/wg.conf" interface = "wg-gw1" endpoint = ":9013" @@ -181,17 +243,6 @@ mandatory = false kms_url = "https://kms.demo.dstack.org" rpc_domain = "rpc.gateway-2.demo.dstack.org" -[core.admin] -enabled = true -port = 9016 -address = "0.0.0.0" - -[core.debug] -insecure_enable_debug_rpc = true -insecure_skip_attestation = false -port = 9015 -address = "0.0.0.0" - [core.sync] enabled = true interval = "30s" @@ -205,9 +256,9 @@ data_dir = "/var/lib/gateway/data" private_key = "" public_key = "" listen_port = 9013 -ip = "10.8.0.1/16" -reserved_net = ["10.8.0.1/32"] -client_ip_range = "10.8.0.0/18" +ip = "10.8.64.1/16" +reserved_net = ["10.8.64.1/32"] +client_ip_range = "10.8.64.0/18" config_path = "/var/lib/gateway/wg.conf" interface = "wg-gw2" endpoint = ":9013" @@ -218,22 +269,358 @@ listen_port = 9014 external_port = 443 ``` -### 2.5 Verify Cluster Sync +### 2.6 Single-Host Deployment Notes + +If you run multiple gateway nodes on the same physical host (for example, multiple CVMs on one teepod / dstack-vmm host), the default example ports above will conflict. You must assign distinct host-facing ports per node. + +Example host port plan for two nodes on one host: + +| Node | RPC (host) | Admin (host) | WireGuard (host) | Proxy (host) | Guest Agent (host) | +|------|------------|--------------|------------------|--------------|--------------------| +| gateway-1 | 19602 | 19603 | 19613/udp | 19643 | 19606 | +| gateway-2 | 19702 | 19703 | 19713/udp | 19743 | 19706 | + +Important: + +- All these host ports must be within the VMM's `port_mapping.range` configuration +- If both nodes should serve the same public wildcard domain on `:443`, place a TCP load balancer / `nginx stream` / HAProxy in front of them and fan out to the two proxy backend ports +- Each gateway VM must have a **unique name** when deployed to the same VMM (e.g., `dstack-gateway-1` and `dstack-gateway-2`) +- Create DNS records for the RPC hostnames before bootstrapping the cluster + +### 2.7 Verify Cluster Sync + +```bash +# Check sync status on any node (replace port with your admin port) +curl -s http://localhost:9016/prpc/WaveKvStatus | jq . + +# List known cluster nodes +curl -s http://localhost:9016/prpc/Status | jq '.nodes' +``` + +A healthy cluster sync shows: +- `enabled: true` on all nodes +- Each node appears in every other node's `.nodes` array +- `last_seen` timestamps are recent (within the sync interval) +- `peer_ack` values are close to `local_ack` (no large lag) + +Example of a healthy 2-node status: + +```json +{ + "id": 1, + "url": "https://rpc.gateway-1:9012", + "num_connections": 5, + "nodes": [ + {"id": 1, "url": "https://rpc.gateway-1:9012", "last_seen": 1773884104}, + {"id": 2, "url": "https://rpc.gateway-2:9012", "last_seen": 1773884100} + ] +} +``` + +## 3. CVM Deployment via dstack-vmm + +When deploying gateways as CVMs via `gateway/dstack-app/`, the deployment is automated through `deploy-to-vmm.sh`. This section explains the CVM-specific workflow. + +### 3.1 Prerequisites + +- A running dstack-vmm instance with available resources +- A dstack OS image (e.g., `dstack-0.5.8`) +- DNS records for the service domain and RPC hostnames (see section 3.2) +- A Cloudflare API token with DNS edit permissions for the zone (for ACME DNS-01 challenges) + +### 3.2 DNS Records + +Before deploying, create the following DNS records pointing to the host's public IP: + +| Record | Type | Value | Purpose | +|--------|------|-------|---------| +| `*.example.com` | A | `` | Wildcard for proxy traffic | +| `gateway-1.example.com` | A | `` | RPC hostname for node 1 | +| `gateway-2.example.com` | A | `` | RPC hostname for node 2 | + +> **Note**: If your wildcard record (`*.example.com`) already covers subdomains like `gateway-1.example.com`, you only need explicit A records for hostnames that must resolve before the wildcard is created, or when using a different IP per node. In most single-host deployments, the wildcard alone is sufficient for all subdomains. + +### 3.2.1 Known Issues with `.env` Template + +The auto-generated `.env` template (created on first run of `deploy-to-vmm.sh`) is missing the `KMS_URL` variable, which is required. You must add it manually: ```bash -# Check sync status on any node -curl -s http://localhost:9016/prpc/Admin.WaveKvStatus | jq . +KMS_URL=https://your-kms-endpoint:port ``` -## 3. Adding Reverse Proxy Domains +### 3.3 GATEWAY_APP_ID + +Each gateway cluster shares a single `GATEWAY_APP_ID`. This ID determines the cryptographic identity of the gateway and must be the same across all nodes in the cluster. + +- **On-chain KMS**: Set `GATEWAY_APP_ID` to the registered app contract address (e.g., `4d6e361b90b3510da8611fe771b1bfddc8ffa4b8`). You must also whitelist the compose hash on-chain (printed by `deploy-to-vmm.sh` as `Compose hash: 0x...`) before the CVM can boot successfully. +- **Test KMS** (dev mode): Define any hex string as the app ID (e.g., `deadbeef0123456789abcdef0123456789abcdef`). All nodes must use the same value. No compose hash whitelisting is needed in dev mode. + +### 3.4 Environment Variables + +The `.env` file configures the deployment. Key variables: -Gateway supports automatic TLS certificate management via the ACME protocol. +| Variable | Required | Description | +|----------|----------|-------------| +| `VMM_RPC` | Yes | VMM RPC endpoint (e.g., `http://127.0.0.1:12000` or `unix:../build/vmm.sock`) | +| `SRV_DOMAIN` | Yes | Service domain (e.g., `example.com`). Used for ZT-Domain and default RPC_DOMAIN | +| `PUBLIC_IP` | Yes | Host's public IPv4 address | +| `NODE_ID` | Yes | Unique node ID (1, 2, ...). Must be > 0 for sync to be enabled | +| `GATEWAY_APP_ID` | Yes | App ID (see section 3.3) | +| `KMS_URL` | Yes | KMS endpoint URL | +| `MY_URL` | Yes | This node's RPC URL (e.g., `https://gateway-1.example.com:19602`) | +| `CF_API_TOKEN` | Yes | Cloudflare API token for DNS-01 ACME challenges | +| `BOOTNODE_URL` | No | Another node's RPC URL for initial peer discovery | +| `SUBNET_INDEX` | No | Subnet index (0-3), determines WG IP allocation. Default: 0 | +| `NET_MODE` | No | `bridge` or `user` (default: `user`). In `user` mode, ports are explicitly forwarded from host to guest. In `bridge` mode, the CVM gets its own IP on the host bridge and all ports are directly accessible — no host port mapping needed, but the VMM must have a bridge interface configured (see [Networking Modes](#networking-modes)) | +| `OS_IMAGE` | No | dstack OS image name. Default: `dstack-0.5.5` | +| `ACME_STAGING` | No | `yes` to use Let's Encrypt staging. Default: `no` | +| `GATEWAY_IMAGE` | No | Docker image for the gateway container | +| `RPC_DOMAIN` | No | RPC hostname for this node. Default: `gateway.` | -### 3.1 Configure ACME Service +> **Note on `RPC_DOMAIN`**: By default, `RPC_DOMAIN` is derived as `gateway.`. In a multi-node cluster where each node has its own RPC hostname (e.g., `gateway-1.example.com`, `gateway-2.example.com`), the `MY_URL` already identifies each node uniquely. The `RPC_DOMAIN` controls the hostname used for the RA-TLS certificate on the RPC endpoint. + +Port variables (required when `NET_MODE=user`): + +| Variable | Default | Description | +|----------|---------|-------------| +| `GATEWAY_RPC_ADDR` | `0.0.0.0:9202` | Host address for RPC port | +| `GATEWAY_ADMIN_RPC_ADDR` | `127.0.0.1:9203` | Host address for admin port | +| `GATEWAY_SERVING_PORT` | `9204` | Host port for proxy traffic | +| `GUEST_AGENT_ADDR` | `127.0.0.1:9206` | Host address for guest agent | +| `WG_ADDR` | `0.0.0.0:9202` | Host address for WireGuard UDP. Defaults to the same port as `GATEWAY_RPC_ADDR` | + +> **Note**: By default, `GATEWAY_RPC_ADDR` and `WG_ADDR` share the same host port (9202) — this works because RPC uses TCP and WireGuard uses UDP. When deploying multiple nodes on the same host, each node must use a different port number for both. If you only set `GATEWAY_RPC_ADDR`, remember to also set `WG_ADDR` to match (or to a different port if desired). + +### 3.5 Deployment Steps + +```bash +cd gateway/dstack-app + +# 1. First run creates a template .env — edit it with your values +bash deploy-to-vmm.sh + +# 2. Edit .env (set VMM_RPC, SRV_DOMAIN, PUBLIC_IP, NODE_ID, GATEWAY_APP_ID, KMS_URL, MY_URL, CF_API_TOKEN, etc.) + +# 3. Deploy node 1 (no BOOTNODE_URL needed) +bash deploy-to-vmm.sh + +# 4. Bootstrap admin config (only once per cluster) +bash bootstrap-cluster.sh + +# 5. For node 2, create a separate directory with its own .env: +cp -r . ../deploy-node2 && cd ../deploy-node2 +# Edit .env with: +# - NODE_ID=2 +# - SUBNET_INDEX=1 +# - MY_URL=https://gateway-2.example.com: +# - Different port assignments if on same host (see section 2.6) +# - BOOTNODE_URL= (optional, speeds up discovery) +# Edit deploy-to-vmm.sh: change --name to dstack-gateway-2 +bash deploy-to-vmm.sh +# No need to run bootstrap-cluster.sh — config syncs from node 1 +``` + +The deploy script (`deploy-to-vmm.sh`) automatically: +- Computes WG IP allocation from SUBNET_INDEX +- Creates the app-compose.json and encrypts environment variables via KMS +- Deploys the CVM to the VMM + +The bootstrap script (`bootstrap-cluster.sh`) configures: +- ACME certbot settings +- DNS credentials (Cloudflare) +- ZT-Domain registration + +**Important**: Admin bootstrap (ACME config, DNS credentials, ZT-Domain setup) is a separate step via `bootstrap-cluster.sh` and only needs to run once per cluster. Additional nodes receive these configurations automatically via cluster sync. + +```bash +# After deploying the first node: +bash bootstrap-cluster.sh # reads GATEWAY_ADMIN_RPC_ADDR from .env +bash bootstrap-cluster.sh 192.168.1.10:8001 # or specify admin address directly +``` + +**VM naming**: The deploy script hardcodes `--name dstack-gateway`. When deploying multiple nodes to the same VMM, you **must** edit the script to use unique names (e.g., `dstack-gateway-1`, `dstack-gateway-2`), otherwise deployment will fail with a name conflict. A recommended approach is to copy the entire `dstack-app/` directory per node: + +```bash +# Create per-node deployment directories +cp -r gateway/dstack-app gateway/deploy-node1 +cp -r gateway/dstack-app gateway/deploy-node2 + +# In each directory's deploy-to-vmm.sh, change the --name argument: +# deploy-node1: --name dstack-gateway-1 +# deploy-node2: --name dstack-gateway-2 +# Edit each directory's .env with node-specific values +``` + +> **Tip**: Consider parameterizing the VM name via an environment variable (e.g., `VM_NAME` in `.env`) instead of editing the script directly, to avoid accidentally losing the change when updating `deploy-to-vmm.sh`. + +### 3.6 Updating Environment Variables + +To update a running gateway's environment (e.g., adding BOOTNODE_URL): + +```bash +# Create updated env file +cat > updated.env <: +MY_URL=https://gateway-1.example.com:19602 +BOOTNODE_URL=https://gateway-2.example.com:19702 +# ... other vars ... +EOF + +# Update and restart +vmm-cli.py --url update-env --env-file updated.env +vmm-cli.py --url stop +vmm-cli.py --url start +``` + +## 4. App Ingress and Port Routing + +When a CVM registers with the gateway, its services become accessible via subdomains of the gateway's ZT-Domain. The gateway determines the backend port from the **SNI hostname** of the incoming request, not from the docker-compose port mapping. + +### SNI Format + +``` +[-[][s|g]]. +``` + +| SNI Pattern | Backend Port | Mode | +|-------------|-------------|------| +| `.example.com` | **80** (default) | TLS termination → TCP | +| `-8080.example.com` | 8080 | TLS termination → TCP | +| `-443s.example.com` | 443 | TLS passthrough (no termination) | +| `-50051g.example.com` | 50051 | TLS termination → HTTP/2 (gRPC) | + +**Common pitfall**: If your docker-compose uses `443:80` (host 443, container 80), the container listens on CVM port 443, but the default SNI (no port suffix) routes to port **80**. Either: +- Use `80:80` in your compose so the default port matches, or +- Access via `-443.example.com` to explicitly target port 443 + +### Example + +```bash +INSTANCE_ID="abc123..." +BASE_DOMAIN="example.com" +GW_PROXY_PORT=19643 + +# Default (port 80) +curl -sk --resolve "${INSTANCE_ID}.${BASE_DOMAIN}:${GW_PROXY_PORT}:127.0.0.1" \ + "https://${INSTANCE_ID}.${BASE_DOMAIN}:${GW_PROXY_PORT}/" + +# Explicit port 8080 +curl -sk --resolve "${INSTANCE_ID}-8080.${BASE_DOMAIN}:${GW_PROXY_PORT}:127.0.0.1" \ + "https://${INSTANCE_ID}-8080.${BASE_DOMAIN}:${GW_PROXY_PORT}/" +``` + +## 5. Deploying a Test App + +After deploying the gateway cluster, verify end-to-end connectivity by deploying a simple app CVM that registers with the gateway. + +### 5.1 Create App Compose + +```bash +cat > /tmp/test-app-compose.yaml <<'EOF' +services: + web: + image: nginx:alpine + ports: + - "80:80" +EOF +``` + +> **Note**: Use `80:80` (not `443:80`) so the default SNI routing (port 80) matches. See [Section 4](#4-app-ingress-and-port-routing) for details. + +### 5.2 Generate and Deploy + +```bash +CLI="vmm-cli.py --url " + +# Generate app-compose.json with gateway registration enabled +$CLI compose \ + --docker-compose /tmp/test-app-compose.yaml \ + --name test-app \ + --kms \ + --gateway \ + --public-logs \ + --public-sysinfo \ + --output /tmp/test-app-compose.json + +# Deploy the CVM, pointing to one of the gateway nodes +$CLI deploy \ + --name test-app \ + --app-id \ + --compose /tmp/test-app-compose.json \ + --kms-url \ + --gateway-url https://gateway-1.example.com: \ + --image dstack-0.5.8 \ + --vcpu 2 \ + --memory 2G +``` + +Wait for boot to complete: + +```bash +$CLI info +# Boot Progress should show: done +# Note the Instance ID from the output +``` + +### 5.3 Verify Gateway Registration + +Check that the gateway sees the new app: + +```bash +curl -s http://localhost:/prpc/Status | jq '.hosts' +``` + +Expected output should include an entry with the app's `instance_id` and an assigned WireGuard IP: + +```json +[{ + "instance_id": "", + "ip": "10.8.0.2", + "app_id": "", + "base_domain": "example.com", + "latest_handshake": 1773890133 +}] +``` + +### 5.4 Test Proxy Access + +Access the app through the gateway proxy on each node: + +```bash +INSTANCE_ID="" +BASE_DOMAIN="example.com" + +# Via node 1 +curl -sk --resolve "${INSTANCE_ID}.${BASE_DOMAIN}::127.0.0.1" \ + "https://${INSTANCE_ID}.${BASE_DOMAIN}:/" + +# Via node 2 +curl -sk --resolve "${INSTANCE_ID}.${BASE_DOMAIN}::127.0.0.1" \ + "https://${INSTANCE_ID}.${BASE_DOMAIN}:/" +``` + +Both should return the nginx welcome page, confirming: +- App CVM registered with the gateway via WireGuard +- Cluster sync propagated the app info to all nodes +- TLS termination and proxy forwarding work on both nodes + +### 5.5 Clean Up + +```bash +$CLI remove +``` + +## 6. Adding Reverse Proxy Domains + +Gateway supports automatic TLS certificate management via the ACME protocol. Configuration can be done via Admin API or Web UI. + +> Note: When deploying via `deploy-to-vmm.sh`, ACME, DNS credentials, and the ZT-Domain are automatically bootstrapped during deployment. The steps below are only needed for manual configuration or adding additional domains. + +### 6.1 Configure ACME Service ```bash # Set ACME URL (Let's Encrypt production) -curl -X POST "http://localhost:9016/prpc/Admin.SetCertbotConfig" \ +curl -X POST "http://localhost:9016/prpc/SetCertbotConfig" \ -H "Content-Type: application/json" \ -d '{"acme_url": "https://acme-v02.api.letsencrypt.org/directory"}' @@ -241,14 +628,16 @@ curl -X POST "http://localhost:9016/prpc/Admin.SetCertbotConfig" \ # "acme_url": "https://acme-staging-v02.api.letsencrypt.org/directory" ``` -### 3.2 Configure DNS Credential +### 6.2 Configure DNS Credential Gateway uses DNS-01 validation, which requires configuring DNS provider API credentials. +The Cloudflare API token needs the **DNS:Edit** permission on the target zone. Create one at [Cloudflare API Tokens](https://dash.cloudflare.com/profile/api-tokens) with the "Edit zone DNS" template. + Cloudflare example: ```bash -curl -X POST "http://localhost:9016/prpc/Admin.CreateDnsCredential" \ +curl -X POST "http://localhost:9016/prpc/CreateDnsCredential" \ -H "Content-Type: application/json" \ -d '{ "name": "cloudflare-prod", @@ -258,9 +647,14 @@ curl -X POST "http://localhost:9016/prpc/Admin.CreateDnsCredential" \ }' ``` -### 3.3 Add Domain +### 6.3 Add Domain + +Call the `AddZtDomain` API to add a domain. Gateway will automatically request a `*.domain` wildcard certificate. + +Before adding a domain: -Call the Admin.AddZtDomain API to add a domain. Gateway will automatically request a *.domain wildcard certificate. +- Point the wildcard DNS record (for example `*.example.com`) to your public load balancer / proxy +- If you use dedicated RPC hostnames such as `rpc1.example.com` and `rpc2.example.com`, make sure those A/AAAA records also exist before cluster bootstrap Parameter description: @@ -275,7 +669,7 @@ Parameter description: Basic usage (using default DNS credential): ```bash -curl -X POST "http://localhost:9016/prpc/Admin.AddZtDomain" \ +curl -X POST "http://localhost:9016/prpc/AddZtDomain" \ -H "Content-Type: application/json" \ -d '{"domain": "example.com", "port": 443}' ``` @@ -283,7 +677,7 @@ curl -X POST "http://localhost:9016/prpc/Admin.AddZtDomain" \ Specifying DNS credential and node binding: ```bash -curl -X POST "http://localhost:9016/prpc/Admin.AddZtDomain" \ +curl -X POST "http://localhost:9016/prpc/AddZtDomain" \ -H "Content-Type: application/json" \ -d '{ "domain": "internal.example.com", @@ -312,22 +706,39 @@ Response example: } ``` -Note: After adding a domain, the certificate is not issued immediately. Gateway will request the certificate asynchronously in the background. You can check certificate status via section 3.5, or manually trigger certificate request via section 3.4. +Note: After adding a domain, the certificate is not issued immediately. Gateway will request the certificate asynchronously in the background. You can check certificate status via section 6.5, or manually trigger certificate request via section 6.4. -### 3.4 Manually Trigger Certificate Renewal +### 6.4 Manually Trigger Certificate Renewal ```bash -curl -X POST "http://localhost:9016/prpc/Admin.RenewZtDomainCert" \ +curl -X POST "http://localhost:9016/prpc/RenewZtDomainCert" \ -H "Content-Type: application/json" \ -d '{"domain": "example.com", "force": true}' ``` -### 3.5 Check Certificate Status +### 6.5 Check Certificate Status ```bash -curl -s http://localhost:9016/prpc/Admin.Status | jq '.zt_domains' +curl -s http://localhost:9016/prpc/ListZtDomains | jq . +``` + +A healthy certificate shows `has_cert: true` and `loaded_in_memory: true`: + +```json +{ + "domains": [{ + "config": {"domain": "example.com", "port": 443, "priority": 100}, + "cert_status": { + "has_cert": true, + "not_after": 1781656344, + "issued_by": 2, + "issued_at": 1773883856, + "loaded_in_memory": true + } + }] +} ``` -### 3.6 Web UI +### 6.6 Web UI -All the above command-line operations can also be performed via Web UI by visiting http://localhost:9016 in a browser. +All the above command-line operations can also be performed via Web UI by visiting `http://localhost:9016` in a browser. diff --git a/gateway/dstack-app/bootstrap-cluster.sh b/gateway/dstack-app/bootstrap-cluster.sh new file mode 100755 index 00000000..06bae6da --- /dev/null +++ b/gateway/dstack-app/bootstrap-cluster.sh @@ -0,0 +1,90 @@ +#!/bin/bash + +# SPDX-FileCopyrightText: © 2025 Phala Network +# +# SPDX-License-Identifier: Apache-2.0 + +# Bootstrap the gateway admin API with ACME config, DNS credentials, and ZT-Domain. +# This only needs to run once per cluster — additional nodes sync config automatically. +# +# Usage: +# bash bootstrap-cluster.sh # Uses GATEWAY_ADMIN_RPC_ADDR from .env +# bash bootstrap-cluster.sh # Explicit admin address (e.g., 127.0.0.1:19603) + +# Load .env if present +if [ -f ".env" ]; then + set -a + source .env + set +a +fi + +ADMIN_ADDR="${1:-${GATEWAY_ADMIN_RPC_ADDR:-127.0.0.1:9203}}" + +echo "Waiting for gateway admin API at $ADMIN_ADDR..." +max_retries=60 +retry=0 +while [ $retry -lt $max_retries ]; do + if curl -sf "http://$ADMIN_ADDR/prpc/Status" >/dev/null 2>&1; then + break + fi + retry=$((retry + 1)) + sleep 5 +done + +if [ $retry -eq $max_retries ]; then + echo "ERROR: admin API not ready after $max_retries retries" + echo "You can configure the gateway manually via the Web UI at http://$ADMIN_ADDR" + exit 1 +fi + +echo "Admin API ready, bootstrapping configuration..." + +# Set ACME URL +if [ "$ACME_STAGING" = "yes" ]; then + ACME_URL="https://acme-staging-v02.api.letsencrypt.org/directory" +else + ACME_URL="https://acme-v02.api.letsencrypt.org/directory" +fi + +echo "Setting certbot config (ACME URL: $ACME_URL)..." +curl -sf -X POST "http://$ADMIN_ADDR/prpc/SetCertbotConfig" \ + -H "Content-Type: application/json" \ + -d '{"acme_url":"'"$ACME_URL"'","renew_interval_secs":3600,"renew_before_expiration_secs":864000,"renew_timeout_secs":300}' >/dev/null \ + && echo " Certbot config set" || echo " WARN: failed to set certbot config" + +# Create DNS credential if CF_API_TOKEN is provided and no credentials exist yet +if [ -n "$CF_API_TOKEN" ]; then + existing=$(curl -sf "http://$ADMIN_ADDR/prpc/ListDnsCredentials" 2>/dev/null) + cred_count=$(echo "$existing" | jq -r '.credentials | length' 2>/dev/null || echo "0") + + if [ "$cred_count" = "0" ]; then + echo "Creating default DNS credential..." + curl -sf -X POST "http://$ADMIN_ADDR/prpc/CreateDnsCredential" \ + -H "Content-Type: application/json" \ + -d '{"name":"cloudflare","provider_type":"cloudflare","cf_api_token":"'"$CF_API_TOKEN"'","set_as_default":true}' >/dev/null \ + && echo " DNS credential created" || echo " WARN: failed to create DNS credential" + else + echo " DNS credentials already exist ($cred_count), skipping" + fi +else + echo " WARN: CF_API_TOKEN not set, skipping DNS credential creation" +fi + +# Add ZT-Domain if SRV_DOMAIN is provided and domain doesn't exist yet +if [ -n "$SRV_DOMAIN" ]; then + existing=$(curl -sf "http://$ADMIN_ADDR/prpc/ListZtDomains" 2>/dev/null) + has_domain=$(echo "$existing" | jq -r '.domains[]? | select(.domain=="'"$SRV_DOMAIN"'") | .domain' 2>/dev/null) + + if [ -z "$has_domain" ]; then + echo "Adding ZT-Domain: $SRV_DOMAIN..." + curl -sf -X POST "http://$ADMIN_ADDR/prpc/AddZtDomain" \ + -H "Content-Type: application/json" \ + -d '{"domain":"'"$SRV_DOMAIN"'","port":443,"priority":100}' >/dev/null \ + && echo " ZT-Domain added" || echo " WARN: failed to add ZT-Domain" + else + echo " ZT-Domain $SRV_DOMAIN already exists, skipping" + fi +fi + +echo "Bootstrap complete" +echo "Gateway Web UI: http://$ADMIN_ADDR" diff --git a/gateway/dstack-app/deploy-to-vmm.sh b/gateway/dstack-app/deploy-to-vmm.sh index 902810f3..65a61a18 100755 --- a/gateway/dstack-app/deploy-to-vmm.sh +++ b/gateway/dstack-app/deploy-to-vmm.sh @@ -5,7 +5,6 @@ # SPDX-License-Identifier: Apache-2.0 APP_COMPOSE_FILE="" - usage() { echo "Usage: $0 [-c ]" echo " -c App compose file" @@ -68,7 +67,8 @@ ACME_STAGING=no # Networking mode: bridge or user (default: user) # NET_MODE=bridge -# Subnet index. 0~15 +# Subnet index (0~3). Each index gets a /18 range within 10.8.0.0/16. +# Must be unique per gateway node in the cluster. SUBNET_INDEX=0 # My URL @@ -142,10 +142,16 @@ if [ -z "$RPC_DOMAIN" ]; then fi # Calculate WireGuard IP allocation from SUBNET_INDEX -WG_IP_PREFIX="10.$((SUBNET_INDEX + 240)).0" -WG_IP="${WG_IP_PREFIX}.1/12" -WG_RESERVED_NET="${WG_IP_PREFIX}.1/32" -WG_CLIENT_RANGE="${WG_IP_PREFIX}.0/16" +# Each node gets a /18 client range (16k addresses) within the 10.8.0.0/16 network. +# Gateway IP uses /16 so it can route to all client ranges across the cluster. +# SUBNET_INDEX 0 → client_ip_range 10.8.0.0/18 (10.8.0.0 ~ 10.8.63.255) +# SUBNET_INDEX 1 → client_ip_range 10.8.64.0/18 (10.8.64.0 ~ 10.8.127.255) +# SUBNET_INDEX 2 → client_ip_range 10.8.128.0/18 (10.8.128.0 ~ 10.8.191.255) +# SUBNET_INDEX 3 → client_ip_range 10.8.192.0/18 (10.8.192.0 ~ 10.8.255.255) +WG_THIRD_OCTET=$((SUBNET_INDEX * 64)) +WG_IP="10.8.${WG_THIRD_OCTET}.1/16" +WG_RESERVED_NET="10.8.${WG_THIRD_OCTET}.1/32" +WG_CLIENT_RANGE="10.8.${WG_THIRD_OCTET}.0/18" # Calculate listen port for proxy if [ "${GATEWAY_SERVING_NUM_PORTS:-1}" -gt 1 ]; then @@ -278,137 +284,9 @@ fi $CLI deploy "${DEPLOY_ARGS[@]}" -# Bootstrap Admin RPC configuration -# Wait for the gateway admin API to be ready, then configure DNS credentials, domain, and certbot -vmm_curl() { - # Calls VMM RPC via curl, handling both unix socket and HTTP URLs - local path="$1"; shift - if [[ "$VMM_RPC" == unix:* ]]; then - local sock="${VMM_RPC#unix:}" - curl --unix-socket "$sock" -sf "http://localhost$path" "$@" - else - curl -sf "${VMM_RPC}${path}" "$@" - fi -} - -get_vm_id() { - # Resolve VM ID from name via VMM Status RPC - local vm_name="$1" - local status - status=$(vmm_curl "/prpc/Status" -X POST -H "Content-Type: application/json" -d '{}' 2>/dev/null) || true - echo "$status" | jq -r --arg name "$vm_name" '.vms[]? | select(.name == $name) | .id' 2>/dev/null | head -1 -} - -get_admin_addr() { - # In bridge mode, get guest IP from VMM guest API and use port 8001 - # In other modes, use the configured GATEWAY_ADMIN_RPC_ADDR - if [ "${NET_MODE:-bridge}" = "bridge" ]; then - local vm_id - vm_id=$(get_vm_id "dstack-gateway") - if [ -z "$vm_id" ]; then - echo "WARN: could not find VM ID for dstack-gateway" >&2 - echo "$GATEWAY_ADMIN_RPC_ADDR" - return - fi - echo "Bridge mode: VM ID=$vm_id, waiting for guest network info..." >&2 - local max_retries=30 - local retry=0 - while [ $retry -lt $max_retries ]; do - local net_info - net_info=$(vmm_curl "/guest/NetworkInfo" \ - -X POST -H "Content-Type: application/json" \ - -d "{\"id\":\"$vm_id\"}" 2>/dev/null) || true - if [ -n "$net_info" ]; then - local guest_ip - guest_ip=$(echo "$net_info" | jq -r ' - .interfaces[]? | - select(.name != "lo") | - .addresses[]? | - .address' 2>/dev/null | head -1) - if [ -n "$guest_ip" ] && [ "$guest_ip" != "null" ]; then - echo " Guest IP: $guest_ip" >&2 - echo "${guest_ip}:8001" - return - fi - fi - retry=$((retry + 1)) - sleep 5 - done - echo "WARN: could not get guest IP, falling back to $GATEWAY_ADMIN_RPC_ADDR" >&2 - fi - echo "$GATEWAY_ADMIN_RPC_ADDR" -} - -bootstrap_admin() { - local admin_addr - admin_addr=$(get_admin_addr) - local max_retries=60 - local retry=0 - - echo "Waiting for gateway admin API at $admin_addr..." - while [ $retry -lt $max_retries ]; do - if curl -sf "http://$admin_addr/prpc/Status" >/dev/null 2>&1; then - break - fi - retry=$((retry + 1)) - sleep 5 - done - - if [ $retry -eq $max_retries ]; then - echo "WARN: admin API not ready after $max_retries retries, skipping bootstrap" - echo "You can configure the gateway manually via the Web UI at http://$admin_addr" - return - fi - - echo "Admin API ready, bootstrapping configuration..." - - # Set ACME URL - if [ "$ACME_STAGING" = "yes" ]; then - ACME_URL="https://acme-staging-v02.api.letsencrypt.org/directory" - else - ACME_URL="https://acme-v02.api.letsencrypt.org/directory" - fi - - echo "Setting certbot config (ACME URL: $ACME_URL)..." - curl -sf -X POST "http://$admin_addr/prpc/SetCertbotConfig" \ - -H "Content-Type: application/json" \ - -d '{"acme_url":"'"$ACME_URL"'","renew_interval_secs":3600,"renew_before_expiration_secs":864000,"renew_timeout_secs":300}' >/dev/null \ - && echo " Certbot config set" || echo " WARN: failed to set certbot config" - - # Create DNS credential if CF_API_TOKEN is provided and no credentials exist yet - if [ -n "$CF_API_TOKEN" ]; then - existing=$(curl -sf "http://$admin_addr/prpc/ListDnsCredentials" 2>/dev/null) - cred_count=$(echo "$existing" | jq -r '.credentials | length' 2>/dev/null || echo "0") - - if [ "$cred_count" = "0" ]; then - echo "Creating default DNS credential..." - curl -sf -X POST "http://$admin_addr/prpc/CreateDnsCredential" \ - -H "Content-Type: application/json" \ - -d '{"name":"cloudflare","provider_type":"cloudflare","cf_api_token":"'"$CF_API_TOKEN"'","set_as_default":true}' >/dev/null \ - && echo " DNS credential created" || echo " WARN: failed to create DNS credential" - else - echo " DNS credentials already exist ($cred_count), skipping" - fi - fi - - # Add ZT-Domain if SRV_DOMAIN is provided and domain doesn't exist yet - if [ -n "$SRV_DOMAIN" ]; then - existing=$(curl -sf "http://$admin_addr/prpc/ListZtDomains" 2>/dev/null) - has_domain=$(echo "$existing" | jq -r '.domains[]? | select(.domain=="'"$SRV_DOMAIN"'") | .domain' 2>/dev/null) - - if [ -z "$has_domain" ]; then - echo "Adding ZT-Domain: $SRV_DOMAIN..." - curl -sf -X POST "http://$admin_addr/prpc/AddZtDomain" \ - -H "Content-Type: application/json" \ - -d '{"domain":"'"$SRV_DOMAIN"'","port":443,"priority":100}' >/dev/null \ - && echo " ZT-Domain added" || echo " WARN: failed to add ZT-Domain" - else - echo " ZT-Domain $SRV_DOMAIN already exists, skipping" - fi - fi - - echo "Bootstrap complete" - echo "Gateway Web UI: http://$admin_addr" -} - -bootstrap_admin +# Run bootstrap-cluster.sh to configure ACME, DNS credentials, and ZT-Domain. +# This only needs to run once per cluster — additional nodes sync config automatically. +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +echo "" +echo "To bootstrap admin config (only needed for the first node in a cluster):" +echo " bash $SCRIPT_DIR/bootstrap-cluster.sh" From 5c12ec40131f35ad44aa216fe9d18ac09fae808c Mon Sep 17 00:00:00 2001 From: Kevin Wang Date: Thu, 19 Mar 2026 03:55:41 +0000 Subject: [PATCH 2/2] docs(gateway): fix node planning bootnode and disk sizing - gateway-1 bootnode: use "none" (matching actual deployment) - disk sizing: use 20 GB for all workload tiers --- gateway/docs/cluster-deployment.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gateway/docs/cluster-deployment.md b/gateway/docs/cluster-deployment.md index b62a5e42..9441e625 100644 --- a/gateway/docs/cluster-deployment.md +++ b/gateway/docs/cluster-deployment.md @@ -78,8 +78,8 @@ Recommended minimum per gateway node: | Workload | vCPU | Memory | Disk | |----------|------|--------|------| | Small (< 100 CVMs) | 4 | 4 GB | 20 GB | -| Medium (100-1000 CVMs) | 8 | 8 GB | 40 GB | -| Large (> 1000 CVMs) | 16+ | 16+ GB | 40+ GB | +| Medium (100-1000 CVMs) | 8 | 8 GB | 20 GB | +| Large (> 1000 CVMs) | 16+ | 16+ GB | 20 GB | ### Networking Modes @@ -108,7 +108,7 @@ bridge = "dstack-br0" | Node | node_id | Gateway IP | Client IP range | bootnode | |------|---------|------------|-----------------|----------| -| gateway-1 | 1 | 10.8.0.1/16 | 10.8.0.0/18 | (none or gateway-2) | +| gateway-1 | 1 | 10.8.0.1/16 | 10.8.0.0/18 | none | | gateway-2 | 2 | 10.8.64.1/16 | 10.8.64.0/18 | gateway-1 | Notes: