feat(profiler): per-board cycle models (Artix-7 + V80 board configs) in the ISA profiler#87
Merged
Merged
Conversation
…isa_analysis.py Adds board_configs/nexys_a7.yaml (Artix-7 XC7A200T / Nexys Video) and v80.yaml (Alveo V80), plus load_board_config / cycle_model_from_board / a --board CLI in the ISA profiler, so a program's cycle cost can be scored against a specific FPGA's per-op latencies instead of plena_settings.toml. Only the board's compute cost model (the latency: section) is consumed; async memory timing (H_PREFETCH/H_STORE) stays uncharged, matching the behaviour simulator's async memory model. Extracted from the WIP profiling branch (sim PR #58) — deliberately excludes that branch's docs, the Rust DDR3/streaming memory model (lib/memory/streaming.rs, cli.rs, main.rs) and its serde Cargo deps, and the stale ATEN_UNROLL compare-harness edits that predate the ATEN_OPS_UNROLL rename. Verified: the profiler runs against a real decoder ASM for both nexys_a7 and v80.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds per-board cycle modelling to the ISA profiler: a program can now be scored against a specific FPGA's per-op latencies via
python isa_analysis.py <asm> --board nexys_a7, instead of the singleplena_settings.tomlmodel.What's included
board_configs/nexys_a7.yaml— Artix-7 XC7A200T (Nexys Video; note the filename saysnexys_a7but the part isxc7a200t, matching the RTL synthesis target) — DDR3-1600 x16 512 MiB memory model, per-op compute latencies, MXFP8 E4M3 / BF16 precision.board_configs/v80.yaml— Alveo V80.isa_analysis.py:_BOARD_CONFIG_DIR,load_board_config(),cycle_model_from_board(), and a--boardCLI (main()), built on the existingSimulatorCycleModel/analyze_asm.Only the board's compute cost model (the
latency:section) is consumed. Async memory timing (H_PREFETCH/H_STORE) is left uncharged, exactly asload_behavior_cycle_modeldoes today — so this changes no existing profiler behaviour, it only adds a new config source.Deliberately excluded (from the WIP profiling branch / sim #58)
This extracts just the board-config + profiler bits. It intentionally leaves out, per request, that branch's: docs (
memory-footprint-and-streaming.md,SMOLVLM2_ISA_PROFILE.md); the Rust DDR3/streaming memory model (lib/memory/src/streaming.rs,lib.rs,src/cli.rs,src/main.rs) and theserde/serde_jsonCargo deps that pair with it; and the compare-harness edits, which on that branch still use the pre-ATEN_OPS_UNROLLenv names (it forked before #56) and would revert main's rename. No submodule pointer change.Verification
Ran the profiler against a real 1-layer decoder ASM (
native_32x32x4) for both boards — produces a full per-op cycle breakdown with no errors;ruffclean.