[TRTLLM-12622][feat] Reland: Add native post-processing hook to trtllm-serve by xwang233 · Pull Request #15631 · NVIDIA/TensorRT-LLM

xwang233 · 2026-06-25T16:05:17Z

Description

Reland of #15239 (native per-request post-processing hook for trtllm-serve, TRTLLM-12622), which was reverted in #15629.

The original PR was reverted because it broke AutoDeploy unit tests relying on DemoLLM with AttributeError: 'DemoLLM' object has no attribute '_post_processor_hook'. DemoLLM.__init__ does not call super().__init__(), so it never picked up the _post_processor_hook attribute that BaseLLM.__init__ sets and that generate_async reads.

This reland re-applies #15239 unchanged and folds in the minimal fix from #15628: initialize self._post_processor_hook = None in DemoLLM.__init__. DemoLLM is the only LLM subclass that bypasses BaseLLM.__init__; all others chain super().__init__(), so no other class needs the attribute.

Test Coverage

Re-applies all tests from [TRTLLM-12622][feat] Add native post-processing hook to trtllm-serve #15239 (unit, api_stability, llm_args, integration e2e).
Fix verified against the failures from [TRTLLM-12622][feat] Add native post-processing hook to trtllm-serve #15239's CI run (build 43475): the affected AutoDeploy test_build_ad / test_eagle cases failed with the _post_processor_hook AttributeError, which this patch resolves.

PR Checklist

Please check this after reviewing the above items as appropriate for this PR.

Summary by CodeRabbit

New Features
- Added support for a user-defined post-processing hook in server and client flows.
- Published documentation covering setup, behavior, supported verdicts, and usage examples.
Bug Fixes
- Improved handling of late or missing responses to avoid request-processing errors.
- Made hook-driven termination end requests promptly and safely.
Tests
- Added unit and end-to-end coverage for rewriting, suppression, termination, and server compatibility checks.

Reland of NVIDIA#15239 (reverted in NVIDIA#15629). Re-applies the native per-request post-processing hook for trtllm-serve, plus the follow-up fix from NVIDIA#15628: initialize DemoLLM._post_processor_hook to None so AutoDeploy unit tests relying on DemoLLM no longer fail with AttributeError. Signed-off-by: Xiao Wang <24860335+xwang233@users.noreply.github.com>

xwang233 · 2026-06-25T16:06:58Z

/bot run --disable-fail-fast

tensorrt-cicd · 2026-06-25T16:13:25Z

PR_Github #55825 [ run ] triggered by Bot. Commit: b315e71 Link to invocation

coderabbitai · 2026-06-25T16:20:22Z

📝 Walkthrough

Walkthrough

Adds a configurable post-processing hook for detokenized output, wires it through serve/LLM/executor paths, adds guardrails for tokenizer and harmony modes, and includes unit, integration, and documentation coverage.

Changes

Post-processing hook feature

Layer / File(s)	Summary
Hook contract and unit behavior `tensorrt_llm/executor/postprocessor_hook.py`, `tests/unittest/executor/test_postprocessor_hook.py`	Defines the hook payload, verdict types, loader, and in-place output mutation logic, with unit tests for emit, suppress, terminate, loader behavior, and state handling.
Result and worker plumbing `tensorrt_llm/executor/result.py`, `tensorrt_llm/executor/postproc_worker.py`, `tensorrt_llm/executor/worker.py`, `tensorrt_llm/executor/proxy.py`, `tests/unittest/executor/test_postprocessor_hook.py`, `tests/unittest/executor/test_proxy_postproc_terminate.py`	Adds hook storage and detokenization in results, threads the hook through postprocess worker startup and record handling, updates proxy cleanup and late-response handling, and adds matching result/proxy tests.
CLI and runtime configuration `tensorrt_llm/commands/serve.py`, `tensorrt_llm/llmapi/llm_args.py`, `tensorrt_llm/llmapi/llm.py`, `tensorrt_llm/_torch/auto_deploy/llm.py`, `tensorrt_llm/serve/openai_server.py`, `tests/unittest/api_stability/references/llm.yaml`, `tests/unittest/llmapi/test_llm_args.py`, `tests/unittest/executor/test_postprocessor_hook.py`	Adds the `post_processor_hook` CLI and LLM argument plumbing, resolves the hook during LLM initialization, propagates it into request outputs and executor config, initializes the DemoLLM hook field, and rejects incompatible tokenizer-skipping and harmony combinations with schema and unit tests.
Docs and OpenAI coverage `docs/source/features/post-processor-hook.md`, `docs/source/index.rst`, `tests/integration/defs/test_e2e.py`, `tests/integration/test_lists/test-db/l0_a10.yml`, `tests/unittest/llmapi/apps/_postproc_hook_samples.py`, `tests/unittest/llmapi/apps/_test_openai_post_processor.py`	Adds the feature documentation page and index entry, sample hook implementations, the E2E wrapper and test-list entries, and OpenAI completion/chat tests across streaming, non-streaming, and `detokenize=false` cases.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

NVIDIA/TensorRT-LLM#15239: Introduces the same post-processor hook feature across the executor, CLI/server wiring, and tests at the same code paths.
NVIDIA/TensorRT-LLM#15629: Reverts the post-processor hook feature across the same hook, CLI, executor, and test paths.

Suggested reviewers

suyoggupta
JunyiXu-nv
chang-l
nv-guomingz

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 31.87% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly identifies the reland and the main feature change: a native post-processing hook for trtllm-serve.
Description check	✅ Passed	The description covers the required Description, Test Coverage, and PR Checklist sections with clear, relevant details.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tensorrt_llm/_torch/auto_deploy/llm.py (1)
212-226: 🎯 Functional Correctness | 🟠 Major

Reject or load post_processor_hook in DemoLLM

DemoLLM inherits LlmArgs, which already accepts post_processor_hook, but this constructor never loads it and hard-sets self._post_processor_hook = None. That silently disables the hook on the demollm path. Either reject the option here or thread it through the same loader used by LLM.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tensorrt_llm/_torch/auto_deploy/llm.py` around lines 212 - 226, DemoLLM
currently accepts post_processor_hook through LlmArgs but drops it by always
setting self._post_processor_hook to None, which disables the hook on the
demollm path. Update DemoLLM.__init__ to either explicitly reject
post_processor_hook in the same way it rejects encode_only, or load and assign
it using the same hook-loading logic as LLM so the argument is honored
consistently.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/source/features/post-processor-hook.md`:
- Around line 30-34: The docs example in the post-processor hook section uses
the less-preferred `--extra_llm_api_options` path for a config-file setup;
update the `trtllm-serve` example to show passing the YAML via `--config`
instead. Keep the `post_processor_hook` setting associated with the same
example, and adjust the surrounding wording so it matches the preferred
configuration style used in the docs.

In `@tensorrt_llm/commands/serve.py`:
- Around line 640-652: The new post-processing hook option is being accepted by
trtllm-serve even when the visual-gen path is used, where it is not forwarded or
enforced. Update the serve command flow around the visual-gen branch and
_serve_visual_gen() to explicitly reject --post_processor_hook for visual-gen
serving, or otherwise propagate it only if that path truly supports it. Use the
existing serve.py option wiring and the visual-gen dispatch logic as the place
to add the validation so unsupported configurations fail fast instead of being
silently ignored.

In `@tensorrt_llm/executor/postproc_worker.py`:
- Around line 222-226: The fast-path in postproc_worker’s response handling is
using record._done to force is_final, but _done is also set by ordinary
stop-string handling in DetokenizedGenerationResultBase._handle_response().
Update the terminate-record check in the worker logic to key off record._aborted
instead, so only the terminate-hook path forces the request finalization and
non-aborted generations still wait for the engine’s real final response.

In `@tensorrt_llm/executor/postprocessor_hook.py`:
- Around line 181-233: The post-processor handling in the result.outputs loop
does not fully fail the chunk closed after a TERMINATE verdict, so sibling
outputs can still be serialized. Update the logic around
PostProcessorHookAction.TERMINATE in postprocessor_hook.py so that once one
output triggers terminate, the remaining outputs in the same result are
blanked/suppressed and marked stopped before the loop exits. Use the existing
result._aborted/result._done flags and output.finish_reason/output.stop_reason
handling in this loop to ensure multi-output requests cannot leak sibling text.

In `@tests/unittest/executor/test_postprocessor_hook.py`:
- Around line 82-156: Add a test in test_postprocessor_hook around
apply_post_processor_hook and terminate() that uses two CompletionOutput objects
in result.outputs, triggers termination from one output, and verifies the whole
request is cancelled. Assert the sibling output is also blanked/withheld and
gets the stopped/finished state expected from terminate(), not just the
triggering output, so the request-wide behavior is covered.

---

Outside diff comments:
In `@tensorrt_llm/_torch/auto_deploy/llm.py`:
- Around line 212-226: DemoLLM currently accepts post_processor_hook through
LlmArgs but drops it by always setting self._post_processor_hook to None, which
disables the hook on the demollm path. Update DemoLLM.__init__ to either
explicitly reject post_processor_hook in the same way it rejects encode_only, or
load and assign it using the same hook-loading logic as LLM so the argument is
honored consistently.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 81a87132-605c-427c-8d5f-ac6decbe1786

📥 Commits

Reviewing files that changed from the base of the PR and between edb14ee and b315e71.

📒 Files selected for processing (20)

docs/source/features/post-processor-hook.md
docs/source/index.rst
tensorrt_llm/_torch/auto_deploy/llm.py
tensorrt_llm/commands/serve.py
tensorrt_llm/executor/postproc_worker.py
tensorrt_llm/executor/postprocessor_hook.py
tensorrt_llm/executor/proxy.py
tensorrt_llm/executor/result.py
tensorrt_llm/executor/worker.py
tensorrt_llm/llmapi/llm.py
tensorrt_llm/llmapi/llm_args.py
tensorrt_llm/serve/openai_server.py
tests/integration/defs/test_e2e.py
tests/integration/test_lists/test-db/l0_a10.yml
tests/unittest/api_stability/references/llm.yaml
tests/unittest/executor/test_postprocessor_hook.py
tests/unittest/executor/test_proxy_postproc_terminate.py
tests/unittest/llmapi/apps/_postproc_hook_samples.py
tests/unittest/llmapi/apps/_test_openai_post_processor.py
tests/unittest/llmapi/test_llm_args.py

govind-ramnarayan

Thanks for fixing this for AutoDeploy!

xwang233 requested review from a team as code owners June 25, 2026 16:05

xwang233 requested review from Shixiaowei02, arysef, schetlur-nv and suyoggupta June 25, 2026 16:05

github-actions Bot assigned xwang233 Jun 25, 2026

xwang233 added the api-compatible Accepted LLM API contract change that is backwards-compatible label Jun 25, 2026

coderabbitai Bot reviewed Jun 25, 2026

View reviewed changes

Comment thread docs/source/features/post-processor-hook.md

Comment thread tensorrt_llm/commands/serve.py

Comment thread tensorrt_llm/executor/postproc_worker.py

Comment thread tensorrt_llm/executor/postprocessor_hook.py

Comment thread tests/unittest/executor/test_postprocessor_hook.py

govind-ramnarayan approved these changes Jun 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TRTLLM-12622][feat] Reland: Add native post-processing hook to trtllm-serve#15631

[TRTLLM-12622][feat] Reland: Add native post-processing hook to trtllm-serve#15631
xwang233 wants to merge 1 commit into
NVIDIA:mainfrom
xwang233:reland-trtllm-12622-post-processor-hook

xwang233 commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

xwang233 commented Jun 25, 2026

Uh oh!

tensorrt-cicd commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

govind-ramnarayan left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

xwang233 commented Jun 25, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Test Coverage

PR Checklist

Summary by CodeRabbit

Uh oh!

xwang233 commented Jun 25, 2026

Uh oh!

tensorrt-cicd commented Jun 25, 2026

Uh oh!

coderabbitai Bot commented Jun 25, 2026

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

govind-ramnarayan left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

xwang233 commented Jun 25, 2026 •

edited by coderabbitai Bot

Loading