[TRTLLM-12982][chore] relocate `torch_multi_arange` by ixlmar · Pull Request #15416 · NVIDIA/TensorRT-LLM

ixlmar · 2026-06-16T12:34:49Z

Description

Follow-up on #14693 (comment).

Commit 800c7ee is from #15413, which is to be merged before this PR.

Test Coverage

Covered by existing tests

PR Checklist

Please review the following before submitting your PR:

PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.
PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.
Test cases are provided for new code paths (see test instructions)
If PR introduces API changes, an appropriate PR label is added - either api-compatible or api-breaking. For api-breaking, include BREAKING in the PR title.
Any new dependencies have been scanned for license and vulnerabilities
CODEOWNERS updated if ownership changes
Documentation updated as needed
Update tava architecture diagram if there is a significant design change in PR.
The reviewers assigned automatically/manually are appropriate for the PR.
Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

To see a list of available CI bot commands, please comment /bot help.

Summary by CodeRabbit

Improvements
- Encoder CUDA graphs now properly detect multi-item scoring scenarios and fall back to eager execution when necessary.
Refactoring
- Optimized attention metadata to accept multi-item configuration during preparation phase instead of forward pass.
- Reorganized utility functions for improved code maintainability.
Chores
- Updated test infrastructure and file organization.

ixlmar · 2026-06-16T12:41:18Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-16T12:46:50Z

PR_Github #54587 [ run ] triggered by Bot. Commit: faad7dc Link to invocation

brb-nv

LGTM.

coderabbitai · 2026-06-16T18:10:07Z

📝 Walkthrough

Walkthrough

torch_multi_arange (with _AcceptSyncCompute/ACCEPT_SYNC_COMPUTE) is moved from sampling_utils.py to utils.py and all import sites are updated. Separately, multi_item_part_lens is removed from AttentionForwardArgs and from all attention forward signatures, and instead passed as a keyword argument to AttentionMetadata.prepare(), with FlashInfer caching the computed FlashInferMultiItemParams in _multi_item_params for use during plan().

Changes

torch_multi_arange relocation and multi_item_part_lens prepare() refactor

Layer / File(s)	Summary
`torch_multi_arange` relocated to `utils.py` `tensorrt_llm/_torch/utils.py`, `tensorrt_llm/_torch/pyexecutor/sampling_utils.py`, `tensorrt_llm/_torch/pyexecutor/sampler.py`, `tests/unittest/_torch/test_torch_multi_arange.py`, `tests/integration/test_lists/test-db/l0_a10.yml`, `.pre-commit-config.yaml`, `legacy-files.txt`, `pyproject.toml`, `ruff-legacy.toml`	`_AcceptSyncCompute`, `ACCEPT_SYNC_COMPUTE`, and `torch_multi_arange` are added to `utils.py` and deleted from `sampling_utils.py`; `sampler.py` import is redirected; test, test-list, and lint config entries are updated to the new path.
`AttentionMetadata.prepare()` and `AttentionForwardArgs` contract `tensorrt_llm/_torch/attention_backend/interface.py`	`AttentionMetadata.prepare()` gains a keyword-only `multi_item_part_lens` parameter; `AttentionForwardArgs` drops its `multi_item_part_lens` field, removing multi-item layout from per-forward args.
Backend `prepare()` enforce/reject multi_item_part_lens `tensorrt_llm/_torch/attention_backend/vanilla.py`, `tensorrt_llm/_torch/attention_backend/star_flashinfer.py`, `tensorrt_llm/_torch/attention_backend/trtllm.py`	`VanillaAttentionMetadata`, `StarAttentionMetadata`, `TrtllmAttentionMetadata`, and `prepare_encoder_only` each add the keyword-only `multi_item_part_lens` parameter and raise `ValueError` when non-`None`; per-forward `ValueError` checks are removed.
FlashInfer metadata caches multi_item_params at prepare() time `tensorrt_llm/_torch/attention_backend/flashinfer.py`	`FlashInferAttentionMetadata` gains `_multi_item_params` field and `_process_multi_item_part_lens()` instance method; `prepare()` computes and stores multi-item tensors; `plan()` passes `_multi_item_params` into `PlanParams`; `forward_impl`/`forward()` have `multi_item_part_lens` removed; `metadata.plan()` is wrapped in `nvtx_range`.
`Attention` module removes `multi_item_part_lens` from forward path `tensorrt_llm/_torch/modules/attention.py`	`_attn_impl`, `forward_impl`, and `forward` drop `multi_item_part_lens` parameters; `AttentionForwardArgs` construction no longer includes it; the RoPE `position_ids` rewrite block for multi-item scoring is deleted.
Executor and LLM API wire `multi_item_part_lens` into `prepare()` `tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py`, `tensorrt_llm/_torch/pyexecutor/model_engine.py`, `tensorrt_llm/llmapi/llm.py`	`EncoderCUDAGraphRunner` falls back to eager when `multi_item_part_lens` is present; `model_engine._prepare_encoder_inputs` reads and passes `multi_item_part_lens` to `prepare_encoder_only()`/`prepare()`, asserting `None` on CUDA-graph replay; `llm.py encode()` gains `@torch.inference_mode()` and computes CUDA `position_ids` via `torch_multi_arange` for multi-item scoring.

Sequence Diagram(s)

sequenceDiagram
    participant encode as llm.encode()
    participant model_engine as _prepare_encoder_inputs
    participant cuda_runner as EncoderCUDAGraphRunner
    participant metadata as FlashInferAttentionMetadata
    participant plan as FlashInferAttentionMetadata.plan()

    encode->>encode: compute position_ids via torch_multi_arange
    encode->>model_engine: inputs (multi_item_part_lens, position_ids)
    model_engine->>cuda_runner: maybe_get_cuda_graph(inputs)
    cuda_runner-->>model_engine: (None, None) — fallback to eager
    model_engine->>metadata: prepare(multi_item_part_lens=...)
    metadata->>metadata: _process_multi_item_part_lens() → _multi_item_params
    model_engine->>plan: plan(...)
    plan->>plan: PlanParams(multi_item_params=_multi_item_params)

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

NVIDIA/TensorRT-LLM#14693: Introduced the original multi-item scoring support via multi_item_part_lens in the FlashInfer backend and AttentionForwardArgs, which this PR refactors by moving the handling from the forward path into prepare().

Suggested reviewers

tburt-nv
Funatiq
brb-nv
chang-l
eopXD

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 41.38% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately describes the main change: relocating the `torch_multi_arange` function as a chore task.
Description check	✅ Passed	The PR description follows the template structure, includes a clear explanation referencing the related PR and commit dependencies, specifies test coverage, and completes the PR checklist.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tensorrt_llm/_torch/attention_backend/flashinfer.py`:
- Around line 744-755: The code accesses req_part_lens[0] and req_part_lens[1:]
without validating that each req_part_lens in multi_item_part_lens has the
required structure, which can cause IndexError or ValueError when constructing
tensors for malformed entries like empty lists or lists with only a prefix_len.
Before constructing the prefix_len_ptr and max_item_len_ptr tensors, add
validation to ensure each req_part_lens has at least two elements (one for
prefix_len and at least one for scored items), and raise an API-level ValueError
with a descriptive message if any request part list fails this validation.
- Around line 762-770: The zip() call combining multi_item_part_lens and
token_pos_in_items_raw_lens needs to add strict=True parameter to document that
these iterables have the same length, which resolves the B905 lint finding.
Additionally, replace the list concatenation in the innermost for loop
(req_part_lens[1:] + [token_pos_in_items_len - token_pos_in_items_raw_len]) with
iterable unpacking syntax instead to resolve the RUF005 lint finding.

In `@tensorrt_llm/_torch/utils.py`:
- Around line 574-580: The variable repeats is initialized as an alias to the
ends tensor, and when starts is None, this alias is never broken before the
in-place multiplication operation repeats *= steps.sign() on line 579. This
mutates the caller's ends tensor. Fix this by using out-of-place arithmetic for
the repeat count calculation: instead of the in-place multiplication repeats *=
steps.sign(), use repeats = repeats * steps.sign() to create a new tensor and
avoid mutating the input.
- Around line 584-602: The prev_range_ends calculation using range_ends.roll(1)
doesn't account for empty ranges where repeats == 0. When a range is empty, its
nominal end value should not be used as the previous range end for the next
range; instead, the end of the last non-empty range should be carried forward.
Modify the logic that computes prev_range_ends to propagate the previous
non-empty range's end value through empty ranges, ensuring that jumps
calculations correctly reflect transitions only between actual non-empty ranges.
- Around line 541-557: Replace the assert statements in the function that
validates dtype, shape, and device compatibility between ends, steps, and starts
parameters with explicit ValueError exceptions that include descriptive error
messages. Additionally, add validation at the function entry to ensure that all
input tensors (starts, ends, and steps) are 1-D tensors, raising ValueError if
they are not, since the implementation later uses unsqueeze and torch.cat
operations that expect 1-D inputs.

In `@tensorrt_llm/llmapi/llm.py`:
- Around line 904-932: The code does not sufficiently validate the structure of
multi_item_part_lens before constructing starts_cuda and ends_cuda, allowing
malformed inputs like [prefix_len] with no item lengths to pass through and fail
later in FlashInfer. Add validation before the torch.tensor calls that construct
starts_cuda and ends_cuda to ensure that each multi_item_part_lens in
batch_multi_item_part_lens has length greater than 1 (meaning at least one item
length in addition to the prefix length) and that all length values are
non-negative. Reject the inputs early with a clear error message if these
conditions are not met.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 89ea82b0-3e34-43c9-bc70-8761dbd903f9

📥 Commits

Reviewing files that changed from the base of the PR and between 0b0a03e and faad7dc.

📒 Files selected for processing (18)

.pre-commit-config.yaml
legacy-files.txt
pyproject.toml
ruff-legacy.toml
tensorrt_llm/_torch/attention_backend/flashinfer.py
tensorrt_llm/_torch/attention_backend/interface.py
tensorrt_llm/_torch/attention_backend/star_flashinfer.py
tensorrt_llm/_torch/attention_backend/trtllm.py
tensorrt_llm/_torch/attention_backend/vanilla.py
tensorrt_llm/_torch/modules/attention.py
tensorrt_llm/_torch/pyexecutor/cuda_graph_runner.py
tensorrt_llm/_torch/pyexecutor/model_engine.py
tensorrt_llm/_torch/pyexecutor/sampler.py
tensorrt_llm/_torch/pyexecutor/sampling_utils.py
tensorrt_llm/_torch/utils.py
tensorrt_llm/llmapi/llm.py
tests/integration/test_lists/test-db/l0_a10.yml
tests/unittest/_torch/test_torch_multi_arange.py

💤 Files with no reviewable changes (2)

tensorrt_llm/_torch/pyexecutor/sampling_utils.py
tensorrt_llm/_torch/modules/attention.py

tensorrt-cicd · 2026-06-16T20:24:11Z

PR_Github #54587 [ run ] completed with state FAILURE. Commit: faad7dc
/LLM/main/L0_MergeRequest_PR pipeline #43630 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

Please check the failed tests and fix your PR
If you cannot view the failures, ask the CI triggerer to share details
Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

juney-nvidia

Approved

Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>

ixlmar · 2026-06-24T17:53:39Z

/bot run

tensorrt-cicd · 2026-06-24T18:01:01Z

PR_Github #55558 [ run ] triggered by Bot. Commit: 7d978c9 Link to invocation

tensorrt-cicd · 2026-06-25T01:14:16Z

PR_Github #55558 [ run ] completed with state SUCCESS. Commit: 7d978c9
/LLM/main/L0_MergeRequest_PR pipeline #44479 completed with status: 'SUCCESS'
Pipeline passed with automatic retried tests. Check the rerun report for details.

CI Report

Link to invocation

github-actions Bot assigned ixlmar Jun 16, 2026

ixlmar requested a review from Funatiq June 16, 2026 12:35

Funatiq approved these changes Jun 16, 2026

View reviewed changes

tburt-nv approved these changes Jun 16, 2026

View reviewed changes

brb-nv approved these changes Jun 16, 2026

View reviewed changes

ixlmar marked this pull request as ready for review June 16, 2026 17:55

ixlmar requested review from a team as code owners June 16, 2026 17:55

ixlmar requested review from HuiGao-NV, ZhanruiSunCh, schetlur-nv, yiqingy0 and yuxianq June 16, 2026 17:55

coderabbitai Bot reviewed Jun 16, 2026

View reviewed changes

yuxianq reviewed Jun 17, 2026

View reviewed changes

Comment thread legacy-files.txt Outdated

yuxianq reviewed Jun 17, 2026

View reviewed changes

Comment thread tensorrt_llm/_torch/attention_backend/interface.py Outdated

MartinMarciniszyn approved these changes Jun 17, 2026

View reviewed changes

juney-nvidia approved these changes Jun 17, 2026

View reviewed changes

ixlmar removed request for HuiGao-NV, schetlur-nv and yiqingy0 June 17, 2026 09:37

ixlmar removed the request for review from ZhanruiSunCh June 17, 2026 09:37

chore: relocate torch_multi_arange

6cf0c06

Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>

ixlmar force-pushed the chore/move-torch-multi-arange branch from faad7dc to 6cf0c06 Compare June 24, 2026 10:10

fix: sort legacy-files.txt

3f1df73

Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>

yuxianq approved these changes Jun 24, 2026

View reviewed changes

ixlmar added 2 commits June 24, 2026 15:46

address review comments

8df1ba1

Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>

fix: handling of empty ranges

7d978c9

Signed-off-by: ixlmar <206748156+ixlmar@users.noreply.github.com>

ixlmar merged commit 157e533 into NVIDIA:main Jun 25, 2026
7 checks passed

ixlmar deleted the chore/move-torch-multi-arange branch June 25, 2026 07:30

ixlmar mentioned this pull request Jun 25, 2026

[TRTLLM-12982][chore] improve multi-item scoring request validation #15627

Open

1 task

Uh oh!

Conversation

ixlmar commented Jun 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

GitHub Bot Help

Summary by CodeRabbit

Uh oh!

ixlmar commented Jun 16, 2026

Uh oh!

tensorrt-cicd commented Jun 16, 2026

Uh oh!

brb-nv left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Jun 16, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tensorrt-cicd commented Jun 16, 2026

Uh oh!

Uh oh!

Uh oh!

juney-nvidia left a comment

Choose a reason for hiding this comment

Uh oh!

ixlmar commented Jun 24, 2026

Uh oh!

tensorrt-cicd commented Jun 24, 2026

Uh oh!

tensorrt-cicd commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

ixlmar commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading