Skip to content

Commit 1f10700

Browse files
authored
Profiling support (#27)
1 parent 02c089d commit 1f10700

7 files changed

Lines changed: 309 additions & 16 deletions

File tree

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
submission.*
22
target/
33
scratch.md
4+
*claude
5+
*.zip

Cargo.lock

Lines changed: 116 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,8 @@ dirs = "5.0"
1919
serde_yaml = "0.9"
2020
webbrowser = "0.8"
2121
base64-url = "3.0.0"
22+
base64 = "0.22"
23+
chrono = "0.4"
2224
urlencoding = "2.1.3"
2325
bytes = "1.10.1"
2426
futures-util = "0.3.31"

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,10 @@ A command-line interface tool for submitting solutions to the [Popcorn Discord B
55

66
Tested on linux and mac but should just work on Windows as well.
77

8+
## New: Nsight Compute Profiling
9+
10+
Profile your kernels with `--mode profile` and get detailed metrics. Currently only available for the NVFP4 Blackwell competition (Modal, which we use for other competitions, does not support NCU). See [docs/profiling.md](docs/profiling.md) for details.
11+
812
## Installation
913

1014
### Option 1: Using pre-built binaries (Recommended)

docs/profiling.md

Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
# Nsight Compute Profiling
2+
3+
Profile your kernels directly from the CLI and get detailed Nsight Compute metrics. This is particularly useful for the NVIDIA NVFP4 Blackwell competition where you need to optimize tensor core utilization.
4+
5+
**Note:** Profiling is currently only available for the NVFP4 Blackwell competition. Modal, which we use for other competitions, does not support NCU.
6+
7+
## Quick Start
8+
9+
```bash
10+
popcorn-cli submit submission.py --leaderboard nvfp4_dual_gemm --gpu NVIDIA --mode profile --no-tui
11+
```
12+
13+
## Expected Output
14+
15+
The profiler returns three key metric tables for each benchmark:
16+
17+
**GPU Throughput** - Overall utilization:
18+
```
19+
Metric Name Metric Unit Metric Value
20+
---------------- ----------- ------------
21+
Memory [%] % 32.48
22+
Compute (SM) [%] % 13.23
23+
```
24+
25+
**Pipe Utilization** - Which pipelines are active:
26+
```
27+
Metric Name Metric Unit Metric Value
28+
-------------------- ----------- ------------
29+
TC % 16.67
30+
TMEM (Tensor Memory) % 15.27
31+
Tensor (FP) % 12.58
32+
ALU % 2.38
33+
TMA % 0.29
34+
```
35+
36+
**Warp State** - Where your warps are stalling:
37+
```
38+
Metric Name Metric Unit Metric Value
39+
------------------------ ----------- ------------
40+
Stall Long Scoreboard inst 18.31
41+
Stall Wait inst 1.88
42+
Stall Short Scoreboard inst 1.23
43+
Selected inst 1.00
44+
Stall Barrier inst 0.75
45+
```
46+
47+
## Trace Files
48+
49+
After profiling, a zip file is saved to your current directory:
50+
```
51+
profile_20260113_031052_run0.zip
52+
```
53+
54+
This contains a `.ncu-rep` file (the full Nsight Compute report):
55+
```
56+
$ unzip -l profile_20260113_031052_run0.zip
57+
Length Date Time Name
58+
--------- ---------- ----- ----
59+
2178383 01-13-2026 03:10 profile.ncu-rep
60+
```
61+
62+
You can open this file in the Nsight Compute GUI for detailed analysis:
63+
```bash
64+
ncu-ui profile.ncu-rep
65+
```

src/cmd/submit.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ impl App {
6767
),
6868
SubmissionModeItem::new(
6969
"Profile".to_string(),
70-
"Profile is currently supported only via Discord. We'll add this feature to the CLI soon.".to_string(),
70+
"Profile the solution using Nsight Compute (NVIDIA) or rocPROF (AMD). Downloads profiling data to current directory.".to_string(),
7171
"profile".to_string(),
7272
),
7373
];

0 commit comments

Comments
 (0)