Skip to content

feat(profiler): per-board cycle models (Artix-7 + V80 board configs) in the ISA profiler#87

Merged
booth-algo merged 2 commits into
mainfrom
feat/board-configs-profiler
Jun 3, 2026
Merged

feat(profiler): per-board cycle models (Artix-7 + V80 board configs) in the ISA profiler#87
booth-algo merged 2 commits into
mainfrom
feat/board-configs-profiler

Conversation

@booth-algo

Copy link
Copy Markdown
Collaborator

Adds per-board cycle modelling to the ISA profiler: a program can now be scored against a specific FPGA's per-op latencies via python isa_analysis.py <asm> --board nexys_a7, instead of the single plena_settings.toml model.

What's included

  • board_configs/nexys_a7.yaml — Artix-7 XC7A200T (Nexys Video; note the filename says nexys_a7 but the part is xc7a200t, matching the RTL synthesis target) — DDR3-1600 x16 512 MiB memory model, per-op compute latencies, MXFP8 E4M3 / BF16 precision.
  • board_configs/v80.yaml — Alveo V80.
  • isa_analysis.py: _BOARD_CONFIG_DIR, load_board_config(), cycle_model_from_board(), and a --board CLI (main()), built on the existing SimulatorCycleModel / analyze_asm.

Only the board's compute cost model (the latency: section) is consumed. Async memory timing (H_PREFETCH/H_STORE) is left uncharged, exactly as load_behavior_cycle_model does today — so this changes no existing profiler behaviour, it only adds a new config source.

Deliberately excluded (from the WIP profiling branch / sim #58)

This extracts just the board-config + profiler bits. It intentionally leaves out, per request, that branch's: docs (memory-footprint-and-streaming.md, SMOLVLM2_ISA_PROFILE.md); the Rust DDR3/streaming memory model (lib/memory/src/streaming.rs, lib.rs, src/cli.rs, src/main.rs) and the serde/serde_json Cargo deps that pair with it; and the compare-harness edits, which on that branch still use the pre-ATEN_OPS_UNROLL env names (it forked before #56) and would revert main's rename. No submodule pointer change.

Verification

Ran the profiler against a real 1-layer decoder ASM (native_32x32x4) for both boards — produces a full per-op cycle breakdown with no errors; ruff clean.

…isa_analysis.py

Adds board_configs/nexys_a7.yaml (Artix-7 XC7A200T / Nexys Video) and v80.yaml (Alveo V80), plus load_board_config / cycle_model_from_board / a --board CLI in the ISA profiler, so a program's cycle cost can be scored against a specific FPGA's per-op latencies instead of plena_settings.toml. Only the board's compute cost model (the latency: section) is consumed; async memory timing (H_PREFETCH/H_STORE) stays uncharged, matching the behaviour simulator's async memory model. Extracted from the WIP profiling branch (sim PR #58) — deliberately excludes that branch's docs, the Rust DDR3/streaming memory model (lib/memory/streaming.rs, cli.rs, main.rs) and its serde Cargo deps, and the stale ATEN_UNROLL compare-harness edits that predate the ATEN_OPS_UNROLL rename. Verified: the profiler runs against a real decoder ASM for both nexys_a7 and v80.
@booth-algo booth-algo merged commit b108a15 into main Jun 3, 2026
4 checks passed
@booth-algo booth-algo deleted the feat/board-configs-profiler branch June 3, 2026 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant