Update codegen to handle the new case of datamov chain#304

Open

n0thingNoob wants to merge 5 commits intomainfrom

Collaborator

n0thingNoob commented Mar 27, 2026 •

edited

Loading

#303
handle the case:
%134 = "neura.not"(%133) {dfg_id = 159 : i32, mapping_locs = [{id = 5 : i32, index_per_ii = 11 : i32, invalid_iterations = 0 : i32, resource = "tile", time_step = 11 : i32, x = 1 : i32, y = 1 : i32}]} : (!neura.data<i1, i1>) -> !neura.data<i1, i1>
%151 = "neura.data_mov"(%134) {dfg_id = 175 : i32, mapping_locs = [{id = 16 : i32, index_per_ii = 11 : i32, invalid_iterations = 0 : i32, resource = "link", time_step = 11 : i32}, {id = 288 : i32, index_per_ii = 12 : i32, invalid_iterations = 0 : i32, per_tile_register_id = 0 : i32, resource = "register", time_step = 12 : i32}, {id = 28 : i32, index_per_ii = 13 : i32, invalid_iterations = 0 : i32, resource = "link", time_step = 13 : i32}, {id = 329 : i32, index_per_ii = 14 : i32, invalid_iterations = 0 : i32, per_tile_register_id = 9 : i32, resource = "register", time_step = 14 : i32}, {id = 329 : i32, index_per_ii = 15 : i32, invalid_iterations = 0 : i32, per_tile_register_id = 9 : i32, resource = "register", time_step = 15 : i32}, {id = 329 : i32, index_per_ii = 16 : i32, invalid_iterations = 0 : i32, per_tile_register_id = 9 : i32, resource = "register", time_step = 16 : i32}, {id = 329 : i32, index_per_ii = 0 : i32, invalid_iterations = 1 : i32, per_tile_register_id = 9 : i32, resource = "register", time_step = 17 : i32}, {id = 329 : i32, index_per_ii = 1 : i32, invalid_iterations = 1 : i32, per_tile_register_id = 9 : i32, resource = "register", time_step = 18 : i32}, {id = 329 : i32, index_per_ii = 2 : i32, invalid_iterations = 1 : i32, per_tile_register_id = 9 : i32, resource = "register", time_step = 19 : i32}]} : (!neura.data<i1, i1>) -> !neura.data<i1, i1>

n0thingNoob added 3 commits

March 25, 2026 16:53


          Merge branch 'main' into testbench

7f9c958


          Merge remote-tracking branch 'origin/main' into testbench

fb61f8d


          update GenerateCodePass, handling the datamov link

de45e0f

Copilot AI review requested due to automatic review settings

March 27, 2026 03:22

Copilot started reviewing on behalf of n0thingNoob

March 27, 2026 03:22

n0thingNoob requested review from Jackcuii, ShangkunLi, Copilot and tancheng and removed request for Copilot

March 27, 2026 03:23

tancheng linked an issue

that may be closed by this pull request

[Urgent] Repeated consumption of a single logical data in different timesteps #303

Open

ShangkunLi approved these changes

View reviewed changes

tancheng assigned n0thingNoob and ShangkunLi

Copilot started reviewing on behalf of n0thingNoob

March 27, 2026 03:39

Copilot AI reviewed

View reviewed changes

Contributor

Copilot AI left a comment

Pull request overview

Updates Neura codegen to correctly model/emit DATA_MOV/CTRL_MOV when a routing path contains register steps between link hops (the “datamov chain” case from issue #303).

Changes:

Added helper to collect register steps between adjacent link time steps.
Enhanced intermediate hop emission to handle link→reg→link segments via deposit / register transfer(s) / egress.
Updated synthetic DFG node construction to mirror the materialized instruction stream for these segments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

lib/NeuraDialect/Transforms/GenerateCodePass.cpp

Comment on lines +700 to +710

+                SmallVector<RegStep, 4>
+                collectRegsBetweenLinks(const SmallVector<RegStep, 4> &regs,
+                                        int start_time_step, int end_time_step) const {
+                  SmallVector<RegStep, 4> segment;
+                  for (const RegStep &reg_step : regs) {
+                    if (reg_step.time_step <= start_time_step) continue;
+                    if (reg_step.time_step >= end_time_step) break;
+                    segment.push_back(reg_step);
+                  }
+                  return segment;
+                }

Copilot AI Mar 27, 2026

This function relies on regs being sorted by time_step (due to the break on >= end_time_step). That assumption isn’t documented/guarded here, so future call sites could accidentally pass an unsorted vector and silently drop valid steps. Consider either (a) adding an explicit assertion/check that regs is sorted, or (b) implementing this via lower_bound/upper_bound on the sorted range (and documenting the sorted precondition).

Copilot uses AI. Check for mistakes.

lib/NeuraDialect/Transforms/GenerateCodePass.cpp

Comment on lines +817 to +830

+                    SmallVector<RegStep, 4> inter_link_regs =
+                        collectRegsBetweenLinks(regs, links[i - 1].time_step, links[i].time_step);
+                    if (inter_link_regs.empty()) {
+                      int time_step = links[i].time_step;
+                      uint64_t sig = static_cast<uint64_t>(llvm::hash_combine(mid_tile, time_step, cur_link));
+                      int hop_id =
+                          base_mov_id >= 0 ? base_mov_id * 10000 + static_cast<int>(hop_counter) : -1;
+                      ++hop_counter;
+                      if (hop_signatures.insert(sig).second) {
+                        placeRouterHop(topo, mid_tile, time_step, incoming_dir, outgoing_dir,
+                                       /*asCtrlMov=*/IsCtrl, hop_id);
+                      }
+                      continue;
+                    }

Copilot AI Mar 27, 2026

Router hops are deduplicated via hop_signatures, but the newly emitted deposit / reg-transfer / egress instructions for link→reg→link segments are always placed without any dedupe. If the same (tile, time_step, ...) sequence is encountered multiple times (similar to why hops needed dedupe), this can cause duplicate instruction emission and potentially conflicting programming of the same cycle/resource. If placeDstDeposit / placeRegisterTransfer / placeSrcEgress are not already idempotent, consider adding signature-based dedupe for these emissions (separate signature sets per instruction kind), or make the placement helpers return early when an identical instruction is already present.

Copilot uses AI. Check for mistakes.

lib/NeuraDialect/Transforms/GenerateCodePass.cpp

Comment on lines +835 to +837

+                    placeDstDeposit(topo, mid_tile, inter_link_regs.front().time_step,
+                                    incoming_dir, inter_link_regs.front().regId,
+                                    /*asCtrlMov=*/IsCtrl, deposit_id);

Copilot AI Mar 27, 2026

Router hops are deduplicated via hop_signatures, but the newly emitted deposit / reg-transfer / egress instructions for link→reg→link segments are always placed without any dedupe. If the same (tile, time_step, ...) sequence is encountered multiple times (similar to why hops needed dedupe), this can cause duplicate instruction emission and potentially conflicting programming of the same cycle/resource. If placeDstDeposit / placeRegisterTransfer / placeSrcEgress are not already idempotent, consider adding signature-based dedupe for these emissions (separate signature sets per instruction kind), or make the placement helpers return early when an identical instruction is already present.

Copilot uses AI. Check for mistakes.

lib/NeuraDialect/Transforms/GenerateCodePass.cpp

Comment on lines +845 to +846

		placeRegisterTransfer<IsCtrl>(topo, mid_tile, cur.time_step,
		prev.regId, cur.regId, transfer_id);

Copilot AI Mar 27, 2026

Router hops are deduplicated via hop_signatures, but the newly emitted deposit / reg-transfer / egress instructions for link→reg→link segments are always placed without any dedupe. If the same (tile, time_step, ...) sequence is encountered multiple times (similar to why hops needed dedupe), this can cause duplicate instruction emission and potentially conflicting programming of the same cycle/resource. If placeDstDeposit / placeRegisterTransfer / placeSrcEgress are not already idempotent, consider adding signature-based dedupe for these emissions (separate signature sets per instruction kind), or make the placement helpers return early when an identical instruction is already present.

Copilot uses AI. Check for mistakes.

lib/NeuraDialect/Transforms/GenerateCodePass.cpp

Comment on lines +852 to +853

		placeSrcEgress(topo, mid_tile, links[i].time_step, outgoing_dir,
		inter_link_regs.back().regId, /asCtrlMov=/IsCtrl, egress_id);

Copilot AI Mar 27, 2026

Router hops are deduplicated via hop_signatures, but the newly emitted deposit / reg-transfer / egress instructions for link→reg→link segments are always placed without any dedupe. If the same (tile, time_step, ...) sequence is encountered multiple times (similar to why hops needed dedupe), this can cause duplicate instruction emission and potentially conflicting programming of the same cycle/resource. If placeDstDeposit / placeRegisterTransfer / placeSrcEgress are not already idempotent, consider adding signature-based dedupe for these emissions (separate signature sets per instruction kind), or make the placement helpers return early when an identical instruction is already present.

Copilot uses AI. Check for mistakes.

lib/NeuraDialect/Transforms/GenerateCodePass.cpp

Comment on lines +1367 to +1369

+                          SmallVector<RegStep, 4> inter_link_regs =
+                              collectRegsBetweenLinks(regs, link_steps[i - 1].time_step,
+                                                      link_steps[i].time_step);

Copilot AI Mar 27, 2026

This allocates and fills a new SmallVector for every adjacent link pair. With small regs this is likely fine, but if reg-step counts grow, this becomes repeated linear scanning/copying. A more efficient approach is to compute index ranges into the already-sorted regs (e.g., via lower_bound/upper_bound) and either iterate that range directly or append into a reused buffer.

Copilot uses AI. Check for mistakes.

n0thingNoob added 2 commits

March 27, 2026 04:41


          update gemm and relu test

90eb5c3


          update fft

2f36662

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet