|
| 1 | +# Nsight Compute Profiling |
| 2 | + |
| 3 | +Profile your kernels directly from the CLI and get detailed Nsight Compute metrics. This is particularly useful for the NVIDIA NVFP4 Blackwell competition where you need to optimize tensor core utilization. |
| 4 | + |
| 5 | +**Note:** Profiling is currently only available for the NVFP4 Blackwell competition. Modal, which we use for other competitions, does not support NCU. |
| 6 | + |
| 7 | +## Quick Start |
| 8 | + |
| 9 | +```bash |
| 10 | +popcorn-cli submit submission.py --leaderboard nvfp4_dual_gemm --gpu NVIDIA --mode profile --no-tui |
| 11 | +``` |
| 12 | + |
| 13 | +## Expected Output |
| 14 | + |
| 15 | +The profiler returns three key metric tables for each benchmark: |
| 16 | + |
| 17 | +**GPU Throughput** - Overall utilization: |
| 18 | +``` |
| 19 | +Metric Name Metric Unit Metric Value |
| 20 | +---------------- ----------- ------------ |
| 21 | +Memory [%] % 32.48 |
| 22 | +Compute (SM) [%] % 13.23 |
| 23 | +``` |
| 24 | + |
| 25 | +**Pipe Utilization** - Which pipelines are active: |
| 26 | +``` |
| 27 | +Metric Name Metric Unit Metric Value |
| 28 | +-------------------- ----------- ------------ |
| 29 | +TC % 16.67 |
| 30 | +TMEM (Tensor Memory) % 15.27 |
| 31 | +Tensor (FP) % 12.58 |
| 32 | +ALU % 2.38 |
| 33 | +TMA % 0.29 |
| 34 | +``` |
| 35 | + |
| 36 | +**Warp State** - Where your warps are stalling: |
| 37 | +``` |
| 38 | +Metric Name Metric Unit Metric Value |
| 39 | +------------------------ ----------- ------------ |
| 40 | +Stall Long Scoreboard inst 18.31 |
| 41 | +Stall Wait inst 1.88 |
| 42 | +Stall Short Scoreboard inst 1.23 |
| 43 | +Selected inst 1.00 |
| 44 | +Stall Barrier inst 0.75 |
| 45 | +``` |
| 46 | + |
| 47 | +## Trace Files |
| 48 | + |
| 49 | +After profiling, a zip file is saved to your current directory: |
| 50 | +``` |
| 51 | +profile_20260113_031052_run0.zip |
| 52 | +``` |
| 53 | + |
| 54 | +This contains a `.ncu-rep` file (the full Nsight Compute report): |
| 55 | +``` |
| 56 | +$ unzip -l profile_20260113_031052_run0.zip |
| 57 | + Length Date Time Name |
| 58 | +--------- ---------- ----- ---- |
| 59 | + 2178383 01-13-2026 03:10 profile.ncu-rep |
| 60 | +``` |
| 61 | + |
| 62 | +You can open this file in the Nsight Compute GUI for detailed analysis: |
| 63 | +```bash |
| 64 | +ncu-ui profile.ncu-rep |
| 65 | +``` |
0 commit comments