docs(args): fix stale cap literals missed by #1056#1064
Conversation
Double the per-core kernel tensor-arg capacity (16 -> 32). The most tensor-hungry in-tree kernel (spmd_paged_attention_highperf) already uses 15 in-core tensors, leaving only one slot of headroom under the old cap of 16. Offset the cost by lowering CORE_MAX_SCALAR_ARGS 32 -> 16 so the tensor+scalar sum stays 48. This keeps PTO2DispatchPayload at 512 B and the SPMD context indices at 48/49, so per-dispatch latency is unchanged. Repo-wide max in-core scalar usage is 8 (spmd_paged_attention), well under the new 16-scalar cap. - arg_direction.h: CORE_MAX_TENSOR_ARGS 16->32, CORE_MAX_SCALAR_ARGS 32->16 - DepGenRecord (tensor-driven): size 2624->4672, _pad0 20->4, DEP_GEN_OVERFLOW_DEPS_PER_RECORD 326->582; docs/dfx/dep_gen.md updated - PTO2TaskPayload: tensors region 2048->4096 B, scalar-region guard 256->128 B; stale cache-line layout comments corrected - pto_types.h: scalar-limit error strings 32->16; guard add_scalars, add_scalars_i32, copy_scalars_from against negative count (signed count bypassed the bounds check -> oversized memcpy / negative scalar_count_) - intrinsic.h / pto2_dispatch_payload.h: comment-only (indices and payload size return to their original 48/49 and 512 B) Verified: a2a3 + a5 build, sim (vector_example, scalar_data) pass; hardware A/B perf-neutral on qwen3 decode (Device +0.3%, within noise).
hw-native-sys#1056 raised CORE_MAX_TENSOR_ARGS to 32 and lowered CORE_MAX_SCALAR_ARGS to 16. It updated the scalar-limit error strings but left two stale literals behind: - pto_types.h: the tensor-limit error string still read "exceeds MAX_TENSOR_ARGS=16"; the cap is now 32 (a2a3 + a5). - pto_runtime2_types.h: the PTO2TaskPayload::init memcpy comment said "Both arrays are 1024B" (dates from a 128-scalar cap). The scalar arrays are now MAX_SCALAR_ARGS * 8 = 128B (a2a3 + a5). Comment/string only; no ABI or behavior change.
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
Warning Review limit reached
More reviews will be available in 22 minutes and 27 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (14)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
What
Fixes two stale cap literals left behind by #1056 (which raised
CORE_MAX_TENSOR_ARGS16→32 and loweredCORE_MAX_SCALAR_ARGS32→16).#1056 updated the scalar-limit error strings but missed:
pto_types.h(a2a3 + a5): the tensor-limit error string still read"exceeds MAX_TENSOR_ARGS=16". The cap is now 32 — a user who hitsthe tensor-arg cap would otherwise get a misleading message.
pto_runtime2_types.h(a2a3 + a5): thePTO2TaskPayload::initmemcpy comment said
"Both arrays are 1024B"(dates from an old128-scalar cap). The scalar arrays are now
MAX_SCALAR_ARGS * 8=128B.
Comment/string only — no ABI or behavior change.
These literals are only correct on top of #1056's cap change. On
main,CORE_MAX_TENSOR_ARGSis still 16 and the scalar arrays are 256B,so the current text is correct there. This branch is stacked on #1056:
Update: raise CORE_MAX_TENSOR_ARGS to 32, lower scalars to 16 #1056 lands, rebasing reduces it to the 4-line doc fix here.