feat: add TRT-RTX native CUDA graph support by tp5uiuc · Pull Request #4187 · pytorch/TensorRT

tp5uiuc · 2026-04-14T11:11:48Z

Description

Add cuda_graph_strategy compilation setting and automatic RTX-native CUDA graph integration for the Python runtime path (PythonTorchTensorRTModule).

TensorRT-RTX has native CUDA graph support via IRuntimeConfig.cuda_graph_strategy, where the JIT compiler handles capture/replay/invalidation internally. This is superior to manual torch.cuda.CUDAGraph() capture on RTX because:

Manual capture freezes fallback kernels; lazy-compiled specialized kernels can never replace them
Runtime allocation or data-dependent shapes can cause cudaStreamBeginCapture to fail
The JIT compiler automatically manages graph staleness (shape changes, pointer changes, kernel readiness)

Key changes

New cuda_graph_strategy setting on CompilationSettings ("disabled" / "whole_graph_capture")
Mapped to trt.CudaGraphStrategy on IRuntimeConfig (same pattern as dynamic_shapes_kernel_specialization_strategy)
SUBGRAPH mode (set_cudagraphs_mode(True)): On RTX, always use RTX-native CUDA graphs — manual capture is bypassed. If cuda_graph_strategy was not explicitly set, the runtime overrides to whole_graph_capture and warns.
WHOLE_GRAPH mode (enable_cudagraphs() with mixed TRT + PyTorch ops): Validates all TRT engines are monolithically capturable via context.is_stream_capturable(stream) and strategy != "lazy". If capturable, proceeds with outer monolithic capture (RTX-native disabled per-engine). If not capturable, raises RuntimeError.
_is_monolithic_capturable() — runtime check combining stream capturability and kernel specialization strategy
_enable_rtx_native_cudagraphs() — recreates execution context with WHOLE_GRAPH_CAPTURE
_check_monolithic_capturability() in CudaGraphsTorchTensorRTModule for mixed graph validation

Behavior matrix

Graph type	cudagraph mode	RTX?	Behavior
TRT-only	SUBGRAPH	Yes	RTX-native always (override if needed)
TRT-only	SUBGRAPH	No	Manual capture (existing)
Mixed	WHOLE_GRAPH	Yes + capturable	Monolithic capture; RTX-native disabled per-engine
Mixed	WHOLE_GRAPH	Yes + NOT capturable	RuntimeError
Mixed	WHOLE_GRAPH	No	Monolithic capture (existing)
Any	No cudagraphs + strategy set	Yes	RTX-native runs transparently

Depends on #4180 (runtime cache) and #4184 (dynamic shapes strategy).

Type of change

New feature (non-breaking change which adds functionality)

Checklist:

My code follows the style guidelines of this project (You can use the linters)
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas and hacks
I have made corresponding changes to the documentation
I have added tests to verify my fix or my feature
New and existing unit tests pass locally with my changes
I have added the relevant labels to my PR in so that relevant reviewers are notified

Add runtime cache support for TensorRT-RTX JIT compilation results, replacing the timing cache which is not used by RTX (no autotuning). Changes: - Skip timing cache creation/saving for TensorRT-RTX in _TRTInterpreter - Add RUNTIME_CACHE_PATH default and runtime_cache_path setting - Wire up IRuntimeCache in PythonTorchTensorRTModule (setup, load, save) - Persist runtime cache to disk with filelock for concurrent access safety - Thread runtime_cache_path through all compile functions - Add unit tests (12 tests) and E2E model tests (6 tests) - Update docstrings and RST documentation Fixes pytorch#3817 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Version provided by upstream torch; no pin needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Expose IRuntimeConfig.setDynamicShapesKernelSpecializationStrategy() through the Torch-TensorRT Python API. Users can now control how shape-specialized kernels are compiled at runtime for dynamic shapes on TensorRT-RTX via the new `dynamic_shapes_kernel_specialization_strategy` compilation setting ("lazy", "eager", or "none"). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Address review feedback: compile with torchtrt.Input min/opt/max ranges so dynamic shapes are actually exercised. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add cuda_graph_strategy compilation setting and automatic RTX-native CUDA graph integration for the Python runtime path. Key changes: - New cuda_graph_strategy setting ("disabled" / "whole_graph_capture") on CompilationSettings, mapped to trt.CudaGraphStrategy on IRuntimeConfig (same pattern as dynamic_shapes_kernel_specialization) - In SUBGRAPH cudagraph mode on RTX, always use RTX-native CUDA graphs (manual torch.cuda.CUDAGraph capture is not safe due to lazy kernel specialization and potential runtime allocation) - _is_monolithic_capturable() check using context.is_stream_capturable() and strategy != "lazy" for WHOLE_GRAPH mode safety validation - _enable_rtx_native_cudagraphs() for runtime context recreation - _check_monolithic_capturability() in CudaGraphsTorchTensorRTModule for mixed TRT + PyTorch graph validation - Comprehensive unit tests covering all code paths Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

tp5uiuc and others added 5 commits April 10, 2026 13:16

build: add filelock to base dependencies

882a001

Version provided by upstream torch; no pin needed. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

test: use dynamic shape inputs in kernel strategy tests

c222c72

Address review feedback: compile with torchtrt.Input min/opt/max ranges so dynamic shapes are actually exercised. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

meta-cla bot added the cla signed label Apr 14, 2026

github-actions bot requested a review from zewenli98 April 14, 2026 11:12

narendasan added the backend: TensorRT-RTX label Apr 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add TRT-RTX native CUDA graph support#4187

feat: add TRT-RTX native CUDA graph support#4187
tp5uiuc wants to merge 5 commits intopytorch:mainfrom
tp5uiuc:feat/trtrtx-cudagraphs

tp5uiuc commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tp5uiuc commented Apr 14, 2026

Description

Key changes

Behavior matrix

Type of change

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants