Skip to content

Update compute requirement bug 6199200#5793

Open
sameerchavan0027 wants to merge 2 commits into
isaac-sim:developfrom
sameerchavan0027:samc/update-doc-with-compute-requirement
Open

Update compute requirement bug 6199200#5793
sameerchavan0027 wants to merge 2 commits into
isaac-sim:developfrom
sameerchavan0027:samc/update-doc-with-compute-requirement

Conversation

@sameerchavan0027
Copy link
Copy Markdown

@sameerchavan0027 sameerchavan0027 commented May 27, 2026

Issue:
Running COMPASS residual RL training with --num_envs 64 causes CUDA out-of-memory errors on an RTX 5090 (32 GB VRAM).
The COMPASS documentation and the IsaacLab policy deployment guide do not mention any minimum GPU VRAM requirement.

Solution:
Update documentation to mention the num of envs that can be safely run with the given hardware

Checklist

  • I have read and understood the contribution guidelines
  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@github-actions github-actions Bot added the documentation Improvements or additions to documentation label May 27, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for improving the GPU memory documentation for NuRec training!

Summary

This PR enhances the compute requirements documentation by:

  • Providing a concrete empirical formula for VRAM estimation (VRAM ≈ 9 GB + 1.3 GB × num_envs)
  • Adding a helpful table with recommended --num_envs settings for different GPU configurations
  • Explaining the 2× overhead from NuRec Real2Sim assets vs. default COMPASS

Suggestions

A few minor observations:

  1. RTX 5090 availability: The RTX 5090 may not be widely available yet. Consider adding a more common card like RTX 4090 (24GB → ~8-10 envs safe) for broader applicability.

  2. Formula context: It might help to note which specific configuration parameters (e.g., camera resolution, USD complexity) the formula was measured with, in case users need to extrapolate beyond the tested setup.

  3. Headroom note: The 15% headroom mentioned is helpful. Consider briefly mentioning this accounts for PPO update spikes and CUDA allocator fragmentation.

CI Note

The "Check for Broken Links" failure is unrelated to this PR—it appears to be catching pre-existing redirect issues in other documentation files (e.g., tensorflow.org redirects).

Overall, this is a valuable documentation improvement that will help users avoid frustrating OOM crashes.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 27, 2026

Greptile Summary

This PR replaces a single-sentence GPU memory note with a more detailed empirical VRAM breakdown, including a linear formula and a two-row reference table for RTX 5090 and RTX A6000 / L40 GPUs.

  • The formula (VRAM ≈ 9 GB + 1.3 GB × num_envs) and headroom calculation appear internally consistent for the A6000 row, but the RTX 5090 row shows ~13 where the same formula yields ~14.
  • The empirical measurements were taken on a single embodiment/environment pair (Carter + nova_carter-galileo); no caveat is present to warn users running other embodiments that the figures may not apply.

Confidence Score: 4/5

Documentation-only change; safe to merge with minor accuracy improvements recommended.

The change is limited to a single RST documentation file with no code impact. The new formula and table are a clear improvement over the previous vague note. The RTX 5090 table entry is slightly inconsistent with the formula stated in the same note, and the empirical basis of the formula is tied to one specific robot/environment combination without an explicit caveat for users of other embodiments.

docs/source/policy_deployment/03_compass_with_NuRec/compass_navigation_policy_with_NuRec.rst — the VRAM table and formula note warrant a second look for accuracy.

Important Files Changed

Filename Overview
docs/source/policy_deployment/03_compass_with_NuRec/compass_navigation_policy_with_NuRec.rst Documentation update replacing a vague GPU memory note with an empirical VRAM formula and a two-row reference table; RTX 5090 table entry (~13) has a small discrepancy vs. what the stated formula + 15% headroom calculation yields (~14).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[User selects --num_envs] --> B{GPU VRAM available?}
    B -->|Known GPU| C[Look up table:\nRTX 5090 → ~13\nA6000/L40 → ~24]
    B -->|Other GPU| D[Apply formula:\nVRAM × 0.85 − 9 GB fixed\ndivide by 1.3 GB per-env]
    C --> E[Start training]
    D --> E
    E --> F{OOM hit?}
    F -->|Yes| G[Reduce --num_envs\nor lower camera resolution\nin scene_assets.camera]
    F -->|No| H[Training proceeds normally]
    G --> A
Loading

Reviews (1): Last reviewed commit: "Update compute requirement bug 6199200" | Re-trigger Greptile

Comment on lines +347 to +351
Empirical fit measured on an RTX A6000 (Carter + ``nova_carter-galileo``):

.. code-block:: text

VRAM ≈ 9 GB (fixed) + 1.3 GB × num_envs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Empirical formula scope not clearly communicated

The formula VRAM ≈ 9 GB (fixed) + 1.3 GB × num_envs was measured on a single configuration (Carter embodiment + nova_carter-galileo environment). Other embodiment types (h1, spot, g1, digit) or different NuRec environments likely have different fixed and per-env costs. Without that caveat, users running non-Carter embodiments may calibrate --num_envs using figures that don't apply to their setup and still hit OOM. Consider adding a sentence such as "These figures are specific to the Carter embodiment with the nova_carter-galileo environment; other combinations may differ."

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Signed-off-by: sameerchavan0027 <sameerchavan@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant