You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Download image tar files from the GitHub Release assets into the images/ directory# The inference image is split into multiple parts due to size constraints# Reassemble the inference image
cat images/openmodel-inference.tar.part_*> images/openmodel-inference.tar
# Load all three images
docker load -i images/openmodel-scheduler.tar
docker load -i images/openmodel-inference.tar
docker load -i images/openmodel-foc-bridge.tar
# Verify images are loaded
docker images | grep openmodel
# Clean up split files (optional)
rm -f images/openmodel-inference.tar.part_*
3. Configure Environment Variables
# Copy the template
cp .env.example .env
# Edit environment variables
vim .env
Required Variables
Variable
Description
LOTUS_API_TOKEN
Lotus daemon API token
SIDECAR_CONFIG
Configuration filename from config/ directory
MODEL_CACHE_DIR
Model cache directory (shared by inference and foc-bridge)
Optional Variables
Variable
Default
Description
HF_ENDPOINT
https://huggingface.co
HuggingFace endpoint
HF_TOKEN
empty
HuggingFace token (required for gated models)
HF_CACHE_DIR
~/.cache/huggingface
HuggingFace cache directory
FOC_PRIVATE_KEY
empty
FOC chain private key
FOC_RPC_URL
calibration testnet
FOC RPC URL
FOC_BRIDGE_PORT
3100
FOC Bridge port
4. Choose Configuration
Configuration files under config/ are designed for different scenarios:
Config File
Scenario
Description
sidecar-prod-test.yaml
Single GPU test
Uses GPU 0, suitable for deployment verification
sidecar-8gpu-multi.yaml
8 GPU multi-instance
One inference engine per GPU, high throughput (recommended for small models)
sidecar-8gpu-tensor.yaml
8 GPU tensor parallel
Multi-GPU collaboration, suitable for 14B+ large models
sidecar-foc.yaml
FOC test
Model download functionality test
You can copy and modify configuration files to match your environment (GPU count, model selection, Lotus port, Curio log path, yield thresholds, etc.).
Key Configuration Items
lotus.miner_address: Your miner address
lotus.api_url: Lotus daemon WebSocket address
curio.dsn: YugabyteDB connection string
curio.log_path: Curio log path (required for WinningPoSt detection)
inference.model: Model to load (e.g., Qwen/Qwen2.5-3B-Instruct)
inference.multi_gpu.mode: tensor_parallel or multi_instance
inference.multi_gpu.device_ids: Which GPUs to use
5. Start Services
# Confirm Curio is running and the log file exists
ls -l /tmp/curio.log
# Start all services
docker compose up -d
# Or start with a specific configuration
SIDECAR_CONFIG=sidecar-8gpu-multi.yaml docker compose up -d
6. Verify Deployment
# Check container status (all should be running/healthy)
docker compose ps
# Check scheduler health
curl http://localhost:9090/health
# Check inference service (model loading may take 1-3 minutes)
curl http://localhost:8000/health
# Check FOC Bridge
curl http://localhost:3100/health
# Send a test inference request
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{ "model": "Qwen/Qwen2.5-3B-Instruct", "messages": [{"role": "user", "content": "Hello!"}] }'
7. Operations
View Logs
# View all logs
docker compose logs -f
# View individual service logs
docker compose logs -f scheduler
docker compose logs -f inference
docker compose logs -f foc-bridge
# View logs from the last 5 minutes
docker compose logs --since 5m
Monitor GPU
watch -n 2 nvidia-smi
Switch Configuration
No need to re-import images. Change SIDECAR_CONFIG in .env and restart:
# After modifying SIDECAR_CONFIG in .env
docker compose restart
# Or specify directly
SIDECAR_CONFIG=sidecar-8gpu-tensor.yaml docker compose up -d