Update codegen to handle the new case of datamov chain#304
Update codegen to handle the new case of datamov chain#304n0thingNoob wants to merge 5 commits intomainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Updates Neura codegen to correctly model/emit DATA_MOV/CTRL_MOV when a routing path contains register steps between link hops (the “datamov chain” case from issue #303).
Changes:
- Added helper to collect register steps between adjacent link time steps.
- Enhanced intermediate hop emission to handle link→reg→link segments via deposit / register transfer(s) / egress.
- Updated synthetic DFG node construction to mirror the materialized instruction stream for these segments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| SmallVector<RegStep, 4> | ||
| collectRegsBetweenLinks(const SmallVector<RegStep, 4> ®s, | ||
| int start_time_step, int end_time_step) const { | ||
| SmallVector<RegStep, 4> segment; | ||
| for (const RegStep ®_step : regs) { | ||
| if (reg_step.time_step <= start_time_step) continue; | ||
| if (reg_step.time_step >= end_time_step) break; | ||
| segment.push_back(reg_step); | ||
| } | ||
| return segment; | ||
| } |
There was a problem hiding this comment.
This function relies on regs being sorted by time_step (due to the break on >= end_time_step). That assumption isn’t documented/guarded here, so future call sites could accidentally pass an unsorted vector and silently drop valid steps. Consider either (a) adding an explicit assertion/check that regs is sorted, or (b) implementing this via lower_bound/upper_bound on the sorted range (and documenting the sorted precondition).
| SmallVector<RegStep, 4> inter_link_regs = | ||
| collectRegsBetweenLinks(regs, links[i - 1].time_step, links[i].time_step); | ||
| if (inter_link_regs.empty()) { | ||
| int time_step = links[i].time_step; | ||
| uint64_t sig = static_cast<uint64_t>(llvm::hash_combine(mid_tile, time_step, cur_link)); | ||
| int hop_id = | ||
| base_mov_id >= 0 ? base_mov_id * 10000 + static_cast<int>(hop_counter) : -1; | ||
| ++hop_counter; | ||
| if (hop_signatures.insert(sig).second) { | ||
| placeRouterHop(topo, mid_tile, time_step, incoming_dir, outgoing_dir, | ||
| /*asCtrlMov=*/IsCtrl, hop_id); | ||
| } | ||
| continue; | ||
| } |
There was a problem hiding this comment.
Router hops are deduplicated via hop_signatures, but the newly emitted deposit / reg-transfer / egress instructions for link→reg→link segments are always placed without any dedupe. If the same (tile, time_step, ...) sequence is encountered multiple times (similar to why hops needed dedupe), this can cause duplicate instruction emission and potentially conflicting programming of the same cycle/resource. If placeDstDeposit / placeRegisterTransfer / placeSrcEgress are not already idempotent, consider adding signature-based dedupe for these emissions (separate signature sets per instruction kind), or make the placement helpers return early when an identical instruction is already present.
| placeDstDeposit(topo, mid_tile, inter_link_regs.front().time_step, | ||
| incoming_dir, inter_link_regs.front().regId, | ||
| /*asCtrlMov=*/IsCtrl, deposit_id); |
There was a problem hiding this comment.
Router hops are deduplicated via hop_signatures, but the newly emitted deposit / reg-transfer / egress instructions for link→reg→link segments are always placed without any dedupe. If the same (tile, time_step, ...) sequence is encountered multiple times (similar to why hops needed dedupe), this can cause duplicate instruction emission and potentially conflicting programming of the same cycle/resource. If placeDstDeposit / placeRegisterTransfer / placeSrcEgress are not already idempotent, consider adding signature-based dedupe for these emissions (separate signature sets per instruction kind), or make the placement helpers return early when an identical instruction is already present.
| placeRegisterTransfer<IsCtrl>(topo, mid_tile, cur.time_step, | ||
| prev.regId, cur.regId, transfer_id); |
There was a problem hiding this comment.
Router hops are deduplicated via hop_signatures, but the newly emitted deposit / reg-transfer / egress instructions for link→reg→link segments are always placed without any dedupe. If the same (tile, time_step, ...) sequence is encountered multiple times (similar to why hops needed dedupe), this can cause duplicate instruction emission and potentially conflicting programming of the same cycle/resource. If placeDstDeposit / placeRegisterTransfer / placeSrcEgress are not already idempotent, consider adding signature-based dedupe for these emissions (separate signature sets per instruction kind), or make the placement helpers return early when an identical instruction is already present.
| placeSrcEgress(topo, mid_tile, links[i].time_step, outgoing_dir, | ||
| inter_link_regs.back().regId, /*asCtrlMov=*/IsCtrl, egress_id); |
There was a problem hiding this comment.
Router hops are deduplicated via hop_signatures, but the newly emitted deposit / reg-transfer / egress instructions for link→reg→link segments are always placed without any dedupe. If the same (tile, time_step, ...) sequence is encountered multiple times (similar to why hops needed dedupe), this can cause duplicate instruction emission and potentially conflicting programming of the same cycle/resource. If placeDstDeposit / placeRegisterTransfer / placeSrcEgress are not already idempotent, consider adding signature-based dedupe for these emissions (separate signature sets per instruction kind), or make the placement helpers return early when an identical instruction is already present.
| SmallVector<RegStep, 4> inter_link_regs = | ||
| collectRegsBetweenLinks(regs, link_steps[i - 1].time_step, | ||
| link_steps[i].time_step); |
There was a problem hiding this comment.
This allocates and fills a new SmallVector for every adjacent link pair. With small regs this is likely fine, but if reg-step counts grow, this becomes repeated linear scanning/copying. A more efficient approach is to compute index ranges into the already-sorted regs (e.g., via lower_bound/upper_bound) and either iterate that range directly or append into a reused buffer.
#303
handle the case:
%134 = "neura.not"(%133) {dfg_id = 159 : i32, mapping_locs = [{id = 5 : i32, index_per_ii = 11 : i32, invalid_iterations = 0 : i32, resource = "tile", time_step = 11 : i32, x = 1 : i32, y = 1 : i32}]} : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
%151 = "neura.data_mov"(%134) {dfg_id = 175 : i32, mapping_locs = [{id = 16 : i32, index_per_ii = 11 : i32, invalid_iterations = 0 : i32, resource = "link", time_step = 11 : i32}, {id = 288 : i32, index_per_ii = 12 : i32, invalid_iterations = 0 : i32, per_tile_register_id = 0 : i32, resource = "register", time_step = 12 : i32}, {id = 28 : i32, index_per_ii = 13 : i32, invalid_iterations = 0 : i32, resource = "link", time_step = 13 : i32}, {id = 329 : i32, index_per_ii = 14 : i32, invalid_iterations = 0 : i32, per_tile_register_id = 9 : i32, resource = "register", time_step = 14 : i32}, {id = 329 : i32, index_per_ii = 15 : i32, invalid_iterations = 0 : i32, per_tile_register_id = 9 : i32, resource = "register", time_step = 15 : i32}, {id = 329 : i32, index_per_ii = 16 : i32, invalid_iterations = 0 : i32, per_tile_register_id = 9 : i32, resource = "register", time_step = 16 : i32}, {id = 329 : i32, index_per_ii = 0 : i32, invalid_iterations = 1 : i32, per_tile_register_id = 9 : i32, resource = "register", time_step = 17 : i32}, {id = 329 : i32, index_per_ii = 1 : i32, invalid_iterations = 1 : i32, per_tile_register_id = 9 : i32, resource = "register", time_step = 18 : i32}, {id = 329 : i32, index_per_ii = 2 : i32, invalid_iterations = 1 : i32, per_tile_register_id = 9 : i32, resource = "register", time_step = 19 : i32}]} : (!neura.data<i1, i1>) -> !neura.data<i1, i1>