[TRTLLM-12622][feat] Reland: Add native post-processing hook to trtllm-serve#15631
[TRTLLM-12622][feat] Reland: Add native post-processing hook to trtllm-serve#15631xwang233 wants to merge 1 commit into
Conversation
Reland of NVIDIA#15239 (reverted in NVIDIA#15629). Re-applies the native per-request post-processing hook for trtllm-serve, plus the follow-up fix from NVIDIA#15628: initialize DemoLLM._post_processor_hook to None so AutoDeploy unit tests relying on DemoLLM no longer fail with AttributeError. Signed-off-by: Xiao Wang <24860335+xwang233@users.noreply.github.com>
|
/bot run --disable-fail-fast |
|
PR_Github #55825 [ run ] triggered by Bot. Commit: |
📝 WalkthroughWalkthroughAdds a configurable post-processing hook for detokenized output, wires it through serve/LLM/executor paths, adds guardrails for tokenizer and harmony modes, and includes unit, integration, and documentation coverage. ChangesPost-processing hook feature
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tensorrt_llm/_torch/auto_deploy/llm.py (1)
212-226: 🎯 Functional Correctness | 🟠 MajorReject or load
post_processor_hookinDemoLLM
DemoLLMinheritsLlmArgs, which already acceptspost_processor_hook, but this constructor never loads it and hard-setsself._post_processor_hook = None. That silently disables the hook on thedemollmpath. Either reject the option here or thread it through the same loader used byLLM.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tensorrt_llm/_torch/auto_deploy/llm.py` around lines 212 - 226, DemoLLM currently accepts post_processor_hook through LlmArgs but drops it by always setting self._post_processor_hook to None, which disables the hook on the demollm path. Update DemoLLM.__init__ to either explicitly reject post_processor_hook in the same way it rejects encode_only, or load and assign it using the same hook-loading logic as LLM so the argument is honored consistently.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/source/features/post-processor-hook.md`:
- Around line 30-34: The docs example in the post-processor hook section uses
the less-preferred `--extra_llm_api_options` path for a config-file setup;
update the `trtllm-serve` example to show passing the YAML via `--config`
instead. Keep the `post_processor_hook` setting associated with the same
example, and adjust the surrounding wording so it matches the preferred
configuration style used in the docs.
In `@tensorrt_llm/commands/serve.py`:
- Around line 640-652: The new post-processing hook option is being accepted by
trtllm-serve even when the visual-gen path is used, where it is not forwarded or
enforced. Update the serve command flow around the visual-gen branch and
_serve_visual_gen() to explicitly reject --post_processor_hook for visual-gen
serving, or otherwise propagate it only if that path truly supports it. Use the
existing serve.py option wiring and the visual-gen dispatch logic as the place
to add the validation so unsupported configurations fail fast instead of being
silently ignored.
In `@tensorrt_llm/executor/postproc_worker.py`:
- Around line 222-226: The fast-path in postproc_worker’s response handling is
using record._done to force is_final, but _done is also set by ordinary
stop-string handling in DetokenizedGenerationResultBase._handle_response().
Update the terminate-record check in the worker logic to key off record._aborted
instead, so only the terminate-hook path forces the request finalization and
non-aborted generations still wait for the engine’s real final response.
In `@tensorrt_llm/executor/postprocessor_hook.py`:
- Around line 181-233: The post-processor handling in the result.outputs loop
does not fully fail the chunk closed after a TERMINATE verdict, so sibling
outputs can still be serialized. Update the logic around
PostProcessorHookAction.TERMINATE in postprocessor_hook.py so that once one
output triggers terminate, the remaining outputs in the same result are
blanked/suppressed and marked stopped before the loop exits. Use the existing
result._aborted/result._done flags and output.finish_reason/output.stop_reason
handling in this loop to ensure multi-output requests cannot leak sibling text.
In `@tests/unittest/executor/test_postprocessor_hook.py`:
- Around line 82-156: Add a test in test_postprocessor_hook around
apply_post_processor_hook and terminate() that uses two CompletionOutput objects
in result.outputs, triggers termination from one output, and verifies the whole
request is cancelled. Assert the sibling output is also blanked/withheld and
gets the stopped/finished state expected from terminate(), not just the
triggering output, so the request-wide behavior is covered.
---
Outside diff comments:
In `@tensorrt_llm/_torch/auto_deploy/llm.py`:
- Around line 212-226: DemoLLM currently accepts post_processor_hook through
LlmArgs but drops it by always setting self._post_processor_hook to None, which
disables the hook on the demollm path. Update DemoLLM.__init__ to either
explicitly reject post_processor_hook in the same way it rejects encode_only, or
load and assign it using the same hook-loading logic as LLM so the argument is
honored consistently.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 81a87132-605c-427c-8d5f-ac6decbe1786
📒 Files selected for processing (20)
docs/source/features/post-processor-hook.mddocs/source/index.rsttensorrt_llm/_torch/auto_deploy/llm.pytensorrt_llm/commands/serve.pytensorrt_llm/executor/postproc_worker.pytensorrt_llm/executor/postprocessor_hook.pytensorrt_llm/executor/proxy.pytensorrt_llm/executor/result.pytensorrt_llm/executor/worker.pytensorrt_llm/llmapi/llm.pytensorrt_llm/llmapi/llm_args.pytensorrt_llm/serve/openai_server.pytests/integration/defs/test_e2e.pytests/integration/test_lists/test-db/l0_a10.ymltests/unittest/api_stability/references/llm.yamltests/unittest/executor/test_postprocessor_hook.pytests/unittest/executor/test_proxy_postproc_terminate.pytests/unittest/llmapi/apps/_postproc_hook_samples.pytests/unittest/llmapi/apps/_test_openai_post_processor.pytests/unittest/llmapi/test_llm_args.py
govind-ramnarayan
left a comment
There was a problem hiding this comment.
Thanks for fixing this for AutoDeploy!
Description
Reland of #15239 (native per-request post-processing hook for
trtllm-serve, TRTLLM-12622), which was reverted in #15629.The original PR was reverted because it broke AutoDeploy unit tests relying on
DemoLLMwithAttributeError: 'DemoLLM' object has no attribute '_post_processor_hook'.DemoLLM.__init__does not callsuper().__init__(), so it never picked up the_post_processor_hookattribute thatBaseLLM.__init__sets and thatgenerate_asyncreads.This reland re-applies #15239 unchanged and folds in the minimal fix from #15628: initialize
self._post_processor_hook = NoneinDemoLLM.__init__.DemoLLMis the onlyLLMsubclass that bypassesBaseLLM.__init__; all others chainsuper().__init__(), so no other class needs the attribute.Test Coverage
test_build_ad/test_eaglecases failed with the_post_processor_hookAttributeError, which this patch resolves.PR Checklist
Summary by CodeRabbit
New Features
Bug Fixes
Tests