Skip to content

feat: add runtime cache API for TensorRT-RTX#4180

Open
tp5uiuc wants to merge 2 commits intopytorch:mainfrom
tp5uiuc:feat/runtime-cache-rtx
Open

feat: add runtime cache API for TensorRT-RTX#4180
tp5uiuc wants to merge 2 commits intopytorch:mainfrom
tp5uiuc:feat/runtime-cache-rtx

Conversation

@tp5uiuc
Copy link
Copy Markdown
Contributor

@tp5uiuc tp5uiuc commented Apr 10, 2026

Description

Add runtime cache support for TensorRT-RTX JIT compilation results, replacing the timing cache which is not used by RTX (no autotuning).

TensorRT-RTX uses JIT compilation at inference time. The runtime cache (IRuntimeCache) stores these compilation results so that kernels and execution graphs are not recompiled on subsequent runs. This is analogous to the timing cache but operates at inference time rather than build time.

Fixes #3817

Changes

  • Skip timing cache for RTX: Early return in _create_timing_cache() and _save_timing_cache() when ENABLED_FEATURES.tensorrt_rtx is True (timing cache is a no-op in TRT-RTX)
  • Add runtime_cache_path setting: New RUNTIME_CACHE_PATH default and runtime_cache_path field in CompilationSettings, threaded through all compile functions
  • Wire up IRuntimeCache in PythonTorchTensorRTModule: Create RuntimeConfig with runtime cache on engine setup, load from disk if available, save on module destruction
  • File locking: Uses filelock for concurrent access safety when multiple processes share the same cache file
  • Documentation: Updated docstrings, compilation settings RST, and engine cache tutorial with new "Runtime Cache (TensorRT-RTX)" section

Type of change

  • New feature (non-breaking change which adds functionality)
  • This change requires a documentation update

Checklist:

  • My code follows the style guidelines of this project (You can use the linters)
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas and hacks
  • I have made corresponding changes to the documentation
  • I have added tests to verify my fix or my feature
  • New and existing unit tests pass locally with my changes
  • I have added the relevant labels to my PR in so that relevant reviewers are notified

Add runtime cache support for TensorRT-RTX JIT compilation results,
replacing the timing cache which is not used by RTX (no autotuning).

Changes:
- Skip timing cache creation/saving for TensorRT-RTX in _TRTInterpreter
- Add RUNTIME_CACHE_PATH default and runtime_cache_path setting
- Wire up IRuntimeCache in PythonTorchTensorRTModule (setup, load, save)
- Persist runtime cache to disk with filelock for concurrent access safety
- Thread runtime_cache_path through all compile functions
- Add unit tests (12 tests) and E2E model tests (6 tests)
- Update docstrings and RST documentation

Fixes pytorch#3817

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@meta-cla meta-cla bot added the cla signed label Apr 10, 2026
@tp5uiuc tp5uiuc marked this pull request as draft April 10, 2026 20:18
@github-actions github-actions bot added documentation Improvements or additions to documentation component: tests Issues re: Tests component: conversion Issues re: Conversion stage component: core Issues re: The core compiler component: api [Python] Issues re: Python API component: runtime component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Apr 10, 2026
@github-actions github-actions bot requested a review from cehongwang April 10, 2026 20:18
The timing cache is **not used with TensorRT-RTX**, which does not perform
autotuning. For TensorRT-RTX, see the *Runtime Cache* section below.

Runtime Cache (TensorRT-RTX)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the runtime cache to both APIs and docs, but these are shared between Enterprise and RTX TensorRT. I don't know if that's OK.

@@ -0,0 +1,287 @@
import gc
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if the filename needs changing

@@ -0,0 +1,329 @@
import gc
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these tests automatically get picked up, or is there a test list that we should add new test to?

logger.debug(f"No existing runtime cache at {self.runtime_cache_path}")
return
try:
from filelock import FileLock
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filelock is a torch dependency already, so we are not introducing additional dependencies here just for this feature. The version will be kept generic enough so that torch is the one providing the right version.

Version provided by upstream torch; no pin needed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions bot added the component: build system Issues re: Build system label Apr 10, 2026
ENABLED_FEATURES.tensorrt_rtx,
"This test verifies standard TRT behavior (non-RTX)",
)
class TestNonRTXUnchanged(TestCase):
Copy link
Copy Markdown
Contributor Author

@tp5uiuc tp5uiuc Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be removed let me know (I asked Claude to be extra defensive)

--extra-index-url https://pypi.ngc.nvidia.com
pyyaml
dllist
filelock
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Torch doesn't pin filelock as well, so there should be no dependency resolution failures I think

base_requirements = [
"packaging>=23",
"typing-extensions>=4.7.0",
"filelock",
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uv.lock already has filelock because of torch

@tp5uiuc tp5uiuc marked this pull request as ready for review April 10, 2026 20:58
@cehongwang cehongwang requested review from lanluo-nvidia and removed request for cehongwang April 11, 2026 00:17
if ENABLED_FEATURES.tensorrt_rtx:
self._setup_runtime_config()

self.context = self._create_context()
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only targets the Python runtime. Same with the dynamic shapes and cuda graphs MR that are to follow.

The C++ runtime changes potentially needs an ABI change, so I will put those in a separate MR after all the python-only changes are finalized.

@@ -257,7 +264,7 @@ def set_device_memory_budget(self, budget_bytes: int) -> int:
if self.context is not None:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a call to self._check_initialized()?

dryrun: bool = _defaults.DRYRUN,
hardware_compatible: bool = _defaults.HARDWARE_COMPATIBLE,
timing_cache_path: str = _defaults.TIMING_CACHE_PATH,
runtime_cache_path: str = _defaults.RUNTIME_CACHE_PATH,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Runtime cache is a JIT-time API : it may not much make sense for cross_compile_for_windows and convert_exported_program_to_serialized_trt_engine. I have added it to the interface as a common API for entry point into torch-TRT, but I can add it to unsupported_settings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend: TensorRT-RTX cla signed component: api [Python] Issues re: Python API component: build system Issues re: Build system component: conversion Issues re: Conversion stage component: core Issues re: The core compiler component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths component: runtime component: tests Issues re: Tests documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 [Bug] TensorRT-RTX: need to remove timing cache

2 participants