Skip to content

docs(args): fix stale cap literals missed by #1056#1064

Open
ChaoZheng109 wants to merge 2 commits into
hw-native-sys:mainfrom
ChaoZheng109:fix-arg-cap-stale-strings
Open

docs(args): fix stale cap literals missed by #1056#1064
ChaoZheng109 wants to merge 2 commits into
hw-native-sys:mainfrom
ChaoZheng109:fix-arg-cap-stale-strings

Conversation

@ChaoZheng109

@ChaoZheng109 ChaoZheng109 commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

What

Fixes two stale cap literals left behind by #1056 (which raised
CORE_MAX_TENSOR_ARGS 16→32 and lowered CORE_MAX_SCALAR_ARGS 32→16).
#1056 updated the scalar-limit error strings but missed:

  • pto_types.h (a2a3 + a5): the tensor-limit error string still read
    "exceeds MAX_TENSOR_ARGS=16". The cap is now 32 — a user who hits
    the tensor-arg cap would otherwise get a misleading message.
  • pto_runtime2_types.h (a2a3 + a5): the PTO2TaskPayload::init
    memcpy comment said "Both arrays are 1024B" (dates from an old
    128-scalar cap). The scalar arrays are now MAX_SCALAR_ARGS * 8 =
    128B.

Comment/string only — no ABI or behavior change.

⚠️ Stacked on #1056

These literals are only correct on top of #1056's cap change. On
main, CORE_MAX_TENSOR_ARGS is still 16 and the scalar arrays are 256B,
so the current text is correct there. This branch is stacked on #1056:

Double the per-core kernel tensor-arg capacity (16 -> 32). The most
tensor-hungry in-tree kernel (spmd_paged_attention_highperf) already
uses 15 in-core tensors, leaving only one slot of headroom under the
old cap of 16.

Offset the cost by lowering CORE_MAX_SCALAR_ARGS 32 -> 16 so the
tensor+scalar sum stays 48. This keeps PTO2DispatchPayload at 512 B and
the SPMD context indices at 48/49, so per-dispatch latency is unchanged.
Repo-wide max in-core scalar usage is 8 (spmd_paged_attention), well
under the new 16-scalar cap.

- arg_direction.h: CORE_MAX_TENSOR_ARGS 16->32, CORE_MAX_SCALAR_ARGS 32->16
- DepGenRecord (tensor-driven): size 2624->4672, _pad0 20->4,
  DEP_GEN_OVERFLOW_DEPS_PER_RECORD 326->582; docs/dfx/dep_gen.md updated
- PTO2TaskPayload: tensors region 2048->4096 B, scalar-region guard
  256->128 B; stale cache-line layout comments corrected
- pto_types.h: scalar-limit error strings 32->16; guard add_scalars,
  add_scalars_i32, copy_scalars_from against negative count (signed
  count bypassed the bounds check -> oversized memcpy / negative
  scalar_count_)
- intrinsic.h / pto2_dispatch_payload.h: comment-only (indices and
  payload size return to their original 48/49 and 512 B)

Verified: a2a3 + a5 build, sim (vector_example, scalar_data) pass;
hardware A/B perf-neutral on qwen3 decode (Device +0.3%, within noise).
hw-native-sys#1056 raised CORE_MAX_TENSOR_ARGS to 32 and lowered CORE_MAX_SCALAR_ARGS
to 16. It updated the scalar-limit error strings but left two stale
literals behind:

- pto_types.h: the tensor-limit error string still read
  "exceeds MAX_TENSOR_ARGS=16"; the cap is now 32 (a2a3 + a5).
- pto_runtime2_types.h: the PTO2TaskPayload::init memcpy comment said
  "Both arrays are 1024B" (dates from a 128-scalar cap). The scalar
  arrays are now MAX_SCALAR_ARGS * 8 = 128B (a2a3 + a5).

Comment/string only; no ABI or behavior change.
@gemini-code-assist

Copy link
Copy Markdown

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@coderabbitai

coderabbitai Bot commented Jun 16, 2026

Copy link
Copy Markdown

Warning

Review limit reached

@ChaoZheng109, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 22 minutes and 27 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e27e8847-34e4-4186-9cab-893a4724c644

📥 Commits

Reviewing files that changed from the base of the PR and between 19b2c0b and d03deca.

📒 Files selected for processing (14)
  • docs/dfx/dep_gen.md
  • src/a2a3/platform/include/common/dep_gen.h
  • src/a2a3/runtime/tensormap_and_ringbuffer/common/intrinsic.h
  • src/a2a3/runtime/tensormap_and_ringbuffer/host/dep_gen_replay.cpp
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto2_dispatch_payload.h
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2_types.h
  • src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_types.h
  • src/a5/platform/include/common/dep_gen.h
  • src/a5/runtime/tensormap_and_ringbuffer/common/intrinsic.h
  • src/a5/runtime/tensormap_and_ringbuffer/host/dep_gen_replay.cpp
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/pto2_dispatch_payload.h
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_runtime2_types.h
  • src/a5/runtime/tensormap_and_ringbuffer/runtime/pto_types.h
  • src/common/task_interface/arg_direction.h

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant