Skip to content

Update codegen to handle the new case of datamov chain#304

Open
n0thingNoob wants to merge 5 commits intomainfrom
testbench
Open

Update codegen to handle the new case of datamov chain#304
n0thingNoob wants to merge 5 commits intomainfrom
testbench

Conversation

@n0thingNoob
Copy link
Collaborator

@n0thingNoob n0thingNoob commented Mar 27, 2026

#303
handle the case:
%134 = "neura.not"(%133) {dfg_id = 159 : i32, mapping_locs = [{id = 5 : i32, index_per_ii = 11 : i32, invalid_iterations = 0 : i32, resource = "tile", time_step = 11 : i32, x = 1 : i32, y = 1 : i32}]} : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
%151 = "neura.data_mov"(%134) {dfg_id = 175 : i32, mapping_locs = [{id = 16 : i32, index_per_ii = 11 : i32, invalid_iterations = 0 : i32, resource = "link", time_step = 11 : i32}, {id = 288 : i32, index_per_ii = 12 : i32, invalid_iterations = 0 : i32, per_tile_register_id = 0 : i32, resource = "register", time_step = 12 : i32}, {id = 28 : i32, index_per_ii = 13 : i32, invalid_iterations = 0 : i32, resource = "link", time_step = 13 : i32}, {id = 329 : i32, index_per_ii = 14 : i32, invalid_iterations = 0 : i32, per_tile_register_id = 9 : i32, resource = "register", time_step = 14 : i32}, {id = 329 : i32, index_per_ii = 15 : i32, invalid_iterations = 0 : i32, per_tile_register_id = 9 : i32, resource = "register", time_step = 15 : i32}, {id = 329 : i32, index_per_ii = 16 : i32, invalid_iterations = 0 : i32, per_tile_register_id = 9 : i32, resource = "register", time_step = 16 : i32}, {id = 329 : i32, index_per_ii = 0 : i32, invalid_iterations = 1 : i32, per_tile_register_id = 9 : i32, resource = "register", time_step = 17 : i32}, {id = 329 : i32, index_per_ii = 1 : i32, invalid_iterations = 1 : i32, per_tile_register_id = 9 : i32, resource = "register", time_step = 18 : i32}, {id = 329 : i32, index_per_ii = 2 : i32, invalid_iterations = 1 : i32, per_tile_register_id = 9 : i32, resource = "register", time_step = 19 : i32}]} : (!neura.data<i1, i1>) -> !neura.data<i1, i1>

image image image

Copilot AI review requested due to automatic review settings March 27, 2026 03:22
@n0thingNoob n0thingNoob requested review from Jackcuii, ShangkunLi, Copilot and tancheng and removed request for Copilot March 27, 2026 03:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates Neura codegen to correctly model/emit DATA_MOV/CTRL_MOV when a routing path contains register steps between link hops (the “datamov chain” case from issue #303).

Changes:

  • Added helper to collect register steps between adjacent link time steps.
  • Enhanced intermediate hop emission to handle link→reg→link segments via deposit / register transfer(s) / egress.
  • Updated synthetic DFG node construction to mirror the materialized instruction stream for these segments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +700 to +710
SmallVector<RegStep, 4>
collectRegsBetweenLinks(const SmallVector<RegStep, 4> &regs,
int start_time_step, int end_time_step) const {
SmallVector<RegStep, 4> segment;
for (const RegStep &reg_step : regs) {
if (reg_step.time_step <= start_time_step) continue;
if (reg_step.time_step >= end_time_step) break;
segment.push_back(reg_step);
}
return segment;
}
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function relies on regs being sorted by time_step (due to the break on >= end_time_step). That assumption isn’t documented/guarded here, so future call sites could accidentally pass an unsorted vector and silently drop valid steps. Consider either (a) adding an explicit assertion/check that regs is sorted, or (b) implementing this via lower_bound/upper_bound on the sorted range (and documenting the sorted precondition).

Copilot uses AI. Check for mistakes.
Comment on lines +817 to +830
SmallVector<RegStep, 4> inter_link_regs =
collectRegsBetweenLinks(regs, links[i - 1].time_step, links[i].time_step);
if (inter_link_regs.empty()) {
int time_step = links[i].time_step;
uint64_t sig = static_cast<uint64_t>(llvm::hash_combine(mid_tile, time_step, cur_link));
int hop_id =
base_mov_id >= 0 ? base_mov_id * 10000 + static_cast<int>(hop_counter) : -1;
++hop_counter;
if (hop_signatures.insert(sig).second) {
placeRouterHop(topo, mid_tile, time_step, incoming_dir, outgoing_dir,
/*asCtrlMov=*/IsCtrl, hop_id);
}
continue;
}
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Router hops are deduplicated via hop_signatures, but the newly emitted deposit / reg-transfer / egress instructions for link→reg→link segments are always placed without any dedupe. If the same (tile, time_step, ...) sequence is encountered multiple times (similar to why hops needed dedupe), this can cause duplicate instruction emission and potentially conflicting programming of the same cycle/resource. If placeDstDeposit / placeRegisterTransfer / placeSrcEgress are not already idempotent, consider adding signature-based dedupe for these emissions (separate signature sets per instruction kind), or make the placement helpers return early when an identical instruction is already present.

Copilot uses AI. Check for mistakes.
Comment on lines +835 to +837
placeDstDeposit(topo, mid_tile, inter_link_regs.front().time_step,
incoming_dir, inter_link_regs.front().regId,
/*asCtrlMov=*/IsCtrl, deposit_id);
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Router hops are deduplicated via hop_signatures, but the newly emitted deposit / reg-transfer / egress instructions for link→reg→link segments are always placed without any dedupe. If the same (tile, time_step, ...) sequence is encountered multiple times (similar to why hops needed dedupe), this can cause duplicate instruction emission and potentially conflicting programming of the same cycle/resource. If placeDstDeposit / placeRegisterTransfer / placeSrcEgress are not already idempotent, consider adding signature-based dedupe for these emissions (separate signature sets per instruction kind), or make the placement helpers return early when an identical instruction is already present.

Copilot uses AI. Check for mistakes.
Comment on lines +845 to +846
placeRegisterTransfer<IsCtrl>(topo, mid_tile, cur.time_step,
prev.regId, cur.regId, transfer_id);
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Router hops are deduplicated via hop_signatures, but the newly emitted deposit / reg-transfer / egress instructions for link→reg→link segments are always placed without any dedupe. If the same (tile, time_step, ...) sequence is encountered multiple times (similar to why hops needed dedupe), this can cause duplicate instruction emission and potentially conflicting programming of the same cycle/resource. If placeDstDeposit / placeRegisterTransfer / placeSrcEgress are not already idempotent, consider adding signature-based dedupe for these emissions (separate signature sets per instruction kind), or make the placement helpers return early when an identical instruction is already present.

Copilot uses AI. Check for mistakes.
Comment on lines +852 to +853
placeSrcEgress(topo, mid_tile, links[i].time_step, outgoing_dir,
inter_link_regs.back().regId, /*asCtrlMov=*/IsCtrl, egress_id);
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Router hops are deduplicated via hop_signatures, but the newly emitted deposit / reg-transfer / egress instructions for link→reg→link segments are always placed without any dedupe. If the same (tile, time_step, ...) sequence is encountered multiple times (similar to why hops needed dedupe), this can cause duplicate instruction emission and potentially conflicting programming of the same cycle/resource. If placeDstDeposit / placeRegisterTransfer / placeSrcEgress are not already idempotent, consider adding signature-based dedupe for these emissions (separate signature sets per instruction kind), or make the placement helpers return early when an identical instruction is already present.

Copilot uses AI. Check for mistakes.
Comment on lines +1367 to +1369
SmallVector<RegStep, 4> inter_link_regs =
collectRegsBetweenLinks(regs, link_steps[i - 1].time_step,
link_steps[i].time_step);
Copy link

Copilot AI Mar 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This allocates and fills a new SmallVector for every adjacent link pair. With small regs this is likely fine, but if reg-step counts grow, this becomes repeated linear scanning/copying. A more efficient approach is to compute index ranges into the already-sorted regs (e.g., via lower_bound/upper_bound) and either iterate that range directly or append into a reused buffer.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Urgent] Repeated consumption of a single logical data in different timesteps

3 participants