Update compute requirement bug 6199200#5793
Conversation
There was a problem hiding this comment.
Thanks for improving the GPU memory documentation for NuRec training!
Summary
This PR enhances the compute requirements documentation by:
- Providing a concrete empirical formula for VRAM estimation (
VRAM ≈ 9 GB + 1.3 GB × num_envs) - Adding a helpful table with recommended
--num_envssettings for different GPU configurations - Explaining the 2× overhead from NuRec Real2Sim assets vs. default COMPASS
Suggestions
A few minor observations:
-
RTX 5090 availability: The RTX 5090 may not be widely available yet. Consider adding a more common card like RTX 4090 (24GB → ~8-10 envs safe) for broader applicability.
-
Formula context: It might help to note which specific configuration parameters (e.g., camera resolution, USD complexity) the formula was measured with, in case users need to extrapolate beyond the tested setup.
-
Headroom note: The 15% headroom mentioned is helpful. Consider briefly mentioning this accounts for PPO update spikes and CUDA allocator fragmentation.
CI Note
The "Check for Broken Links" failure is unrelated to this PR—it appears to be catching pre-existing redirect issues in other documentation files (e.g., tensorflow.org redirects).
Overall, this is a valuable documentation improvement that will help users avoid frustrating OOM crashes.
Greptile SummaryThis PR replaces a single-sentence GPU memory note with a more detailed empirical VRAM breakdown, including a linear formula and a two-row reference table for RTX 5090 and RTX A6000 / L40 GPUs.
Confidence Score: 4/5Documentation-only change; safe to merge with minor accuracy improvements recommended. The change is limited to a single RST documentation file with no code impact. The new formula and table are a clear improvement over the previous vague note. The RTX 5090 table entry is slightly inconsistent with the formula stated in the same note, and the empirical basis of the formula is tied to one specific robot/environment combination without an explicit caveat for users of other embodiments. docs/source/policy_deployment/03_compass_with_NuRec/compass_navigation_policy_with_NuRec.rst — the VRAM table and formula note warrant a second look for accuracy. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[User selects --num_envs] --> B{GPU VRAM available?}
B -->|Known GPU| C[Look up table:\nRTX 5090 → ~13\nA6000/L40 → ~24]
B -->|Other GPU| D[Apply formula:\nVRAM × 0.85 − 9 GB fixed\ndivide by 1.3 GB per-env]
C --> E[Start training]
D --> E
E --> F{OOM hit?}
F -->|Yes| G[Reduce --num_envs\nor lower camera resolution\nin scene_assets.camera]
F -->|No| H[Training proceeds normally]
G --> A
Reviews (1): Last reviewed commit: "Update compute requirement bug 6199200" | Re-trigger Greptile |
| Empirical fit measured on an RTX A6000 (Carter + ``nova_carter-galileo``): | ||
|
|
||
| .. code-block:: text | ||
|
|
||
| VRAM ≈ 9 GB (fixed) + 1.3 GB × num_envs |
There was a problem hiding this comment.
Empirical formula scope not clearly communicated
The formula VRAM ≈ 9 GB (fixed) + 1.3 GB × num_envs was measured on a single configuration (Carter embodiment + nova_carter-galileo environment). Other embodiment types (h1, spot, g1, digit) or different NuRec environments likely have different fixed and per-env costs. Without that caveat, users running non-Carter embodiments may calibrate --num_envs using figures that don't apply to their setup and still hit OOM. Consider adding a sentence such as "These figures are specific to the Carter embodiment with the nova_carter-galileo environment; other combinations may differ."
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: sameerchavan0027 <sameerchavan@nvidia.com>
Issue:
Running COMPASS residual RL training with --num_envs 64 causes CUDA out-of-memory errors on an RTX 5090 (32 GB VRAM).
The COMPASS documentation and the IsaacLab policy deployment guide do not mention any minimum GPU VRAM requirement.
Solution:
Update documentation to mention the num of envs that can be safely run with the given hardware
Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists there