Profiler for LLM inference.
hotpath profiles live vLLM and SGLang servers, analyzes request and GPU behavior, and recommends when to split prefill and decode.
- Profile a live endpoint with real traffic
- Analyze queueing, prefill, decode, cache, and batching
- Recommend disaggregation and generate deployment configs
uv tool install hotpathProfile a live vLLM server:
hotpath serve-profile \
--endpoint http://localhost:8000 \
--traffic prompts.jsonl \
--concurrency 4 \
--duration 60 \
--output .hotpath/runView the report:
hotpath serve-report .hotpath/run/serve_profile.dbGenerate deployment configs:
hotpath disagg-config .hotpath/run/serve_profile.db --format allIf you want server-side request timing, start vLLM with debug logs and pass the log file:
VLLM_LOGGING_LEVEL=DEBUG vllm serve <model> 2>vllm.log &
hotpath serve-profile \
--endpoint http://localhost:8000 \
--traffic prompts.jsonl \
--server-log vllm.log \
--concurrency 4 \
--duration 60If you want kernel-level GPU traces, add --nsys:
hotpath serve-profile \
--endpoint http://localhost:8000 \
--traffic prompts.jsonl \
--nsysJSONL, one request per line:
{"prompt": "Explain KV cache eviction policy.", "max_tokens": 256}
{"prompt": "Write a Python retry decorator with exponential backoff.", "max_tokens": 400}ShareGPT format is also supported.
| Command | Description |
|---|---|
serve-profile |
Profile a live vLLM or SGLang server |
serve-report |
Print a serving analysis report |
disagg-config |
Generate deployment configs for disaggregated serving |
profile |
Run GPU kernel profiling under RL-style traffic |
report |
View a saved kernel profile |
diff |
Compare two kernel profiles |
bench |
Benchmark individual GPU kernel implementations |
export |
Export profile data to JSON, CSV, or OTLP |
doctor |
Check local profiling environment |
lock-clocks |
Lock GPU clocks for reproducible measurements |
- Linux
- NVIDIA GPU with CUDA driver
nsysfor kernel profiling- vLLM or SGLang for serving analysis
cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --parallel
ctest --test-dir build --output-on-failureInstall from source:
uv tool install .Requirements: CMake 3.28+, C++20 compiler, SQLite3.
hotpath stores results in SQLite and combines three data sources:
- Kernel traces from
nsys - Server metrics from
/metrics - Request lifecycle timing from client traces and vLLM debug logs
The report turns those signals into latency breakdowns, cache analysis, prefix-sharing analysis, and a disaggregation recommendation.
See CHANGELOG.md.