This document describes how to deploy the RadSysX BiomedParse backend on a cloud VM with NVIDIA GPU.
- Framework: FastAPI (Python)
- GPU: NVIDIA CUDA runtime (Docker-based)
- Inference: BiomedParse v2 (3D-enabled) via
backend/server.pyandbackend/biomedparse_api.py - API base path:
/api/biomedparse/v1 - Static artifacts:
/files/*->backend/tmp/biomedparse
- A cloud VM with an NVIDIA GPU (e.g., 12 GB VRAM or more recommended).
- SSH access to the VM.
- Docker installed.
- NVIDIA Container Toolkit installed for GPU inside Docker.
- The BiomedParse 3D checkpoint file available on the VM (path used by
BP3D_CKPT).
References:
- NVIDIA Container Toolkit installation:
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html - Docker installation:
https://docs.docker.com/get-docker/
- SSH into your VM.
- Clone the repository:
git clone https://github.com/<your-org-or-user>/RadSysX.git
cd RadSysX- Place your 3D checkpoint on the VM and note the absolute path. For example:
mkdir -p /opt/weights
cp /path/to/biomedparse_3D_AllData_MultiView_edge.ckpt /opt/weights/- Build the GPU image:
docker build -t radsysx-backend:gpu -f backend/Dockerfile .- Start the container with GPU access (mapping weights into the container) and environment variables:
docker run --gpus all -p 8000:8000 \
-e BP3D_CKPT=/weights/biomedparse_3D_AllData_MultiView_edge.ckpt \
-e BP_TMP_TTL=7200 -e BP_TMP_SWEEP=1800 -e BP_VALIDATE_HEATMAP=1 \
-v /opt/weights:/weights \
radsysx-backend:gpu- Verify the service:
curl http://<VM_IP>:8000/api/biomedparse/v1/healthIf you see { "status": "healthy", "gpu_available": true }, the API is up with GPU.
- Open interactive docs in a browser:
http://<VM_IP>:8000/docs
- Health:
curl -s http://<VM_IP>:8000/api/biomedparse/v1/health | jq .- 2D predict example (PNG/JPG):
curl -s -X POST \
-F "file=@/path/to/example.png" \
-F "prompts=liver" \
"http://<VM_IP>:8000/api/biomedparse/v1/predict-2d?threshold=0.5&return_heatmap=true" | jq .- 3D predict (NIfTI):
curl -s -X POST \
-F "file=@/path/to/volume.nii.gz" \
-F "prompts=liver" \
"http://<VM_IP>:8000/api/biomedparse/v1/predict-3d-nifti?return_heatmap=true" | jq .- Fetch NPZ artifacts (for debugging):
curl -s "http://<VM_IP>:8000/api/biomedparse/v1/fetch-npz?name=seg_XXXX.npz&key=seg" | jq .
curl -s "http://<VM_IP>:8000/api/biomedparse/v1/fetch-npz?name=prob_YYYY.npz&key=prob" | jq .Set these in your .env or pass with -e in docker run.
# Required: absolute path inside the CONTAINER to the 3D checkpoint
BP3D_CKPT=/weights/biomedparse_3D_AllData_MultiView_edge.ckpt
# Transient artifact TTL and sweep (seconds)
BP_TMP_TTL=7200
BP_TMP_SWEEP=1800
# Validate that heatmap NPZ contains key 'prob' as uint8 (1=on, 0=off)
BP_VALIDATE_HEATMAP=1
# Optional: force slice batch size; otherwise auto‑tuned by available VRAM
#BP_SLICE_BATCH_SIZE=4Notes:
- Temp files are saved under
backend/tmp/biomedparseand served via/files/*. - The cleanup daemon purges
.npzartifacts older thanBP_TMP_TTLeveryBP_TMP_SWEEPseconds. - When using Docker, prefer a plain
.envfile (not.env.local). Load it with--env-file .envor map individual variables with-eflags. - Ensure your cloud firewall/security groups open TCP port
8000to authorized client IPs only (and any other ports you expose). On the VM itself, allow the same in the OS firewall if enabled.
- Install Python 3.10+, CUDA drivers, and a CUDA-enabled PyTorch.
- Install dependencies:
pip install fastapi uvicorn[standard] python-multipart pydantic numpy pillow nibabel pydicom hydra-core omegaconf python-dotenv
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121- Export environment variables (see the section above) and run the API:
uvicorn backend.server:app --host 0.0.0.0 --port 8000- Out-of-memory (OOM): lower
BP_SLICE_BATCH_SIZEor pass?slice_batch_size=...on 3D endpoints. - GPU not detected: verify
nvidia-smion the host and that the container runs with--gpus all. - Missing checkpoint: ensure
BP3D_CKPTpoints to a readable file inside the container (mount it with-v). - Heatmap NPZ validation errors: set
BP_VALIDATE_HEATMAP=1(default) and ensure the artifact containsprob(uint8). - CORS: the server currently allows all origins; restrict in production (edit
backend/server.py). - Security: only expose port 8000 to authorized IPs; consider a reverse proxy with auth for internet-facing deployments.