Register RLinf GR00T obs/action converters for N1.6/N1.7 + config-driven mapping#5873
Register RLinf GR00T obs/action converters for N1.6/N1.7 + config-driven mapping#5873johnnynunez wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot
PR #5873: Register RLinf GR00T obs/action converters for N1.6/N1.7 + config-driven mapping
✅ Summary
This PR extends the RLinf GR00T integration to support N1.6 and N1.7 embodiment modules (in addition to the existing N1.5 support) and adds config-driven unit conversion capabilities. The changes are additive and backward-compatible.
📋 Code Review
_register_gr00t_converters (Registration Logic)
- ✅ Good: Gracefully handles missing modules via try/except
- ✅ Good: Registers converters to all available GR00T versions
- ✅ Good: Clear, informative logging with module names
Minor suggestion: Consider using except ImportError instead of except Exception for the module imports (lines 268, 275, 282). This is more precise and avoids accidentally catching unrelated exceptions during import:
except ImportError as exc:
logger.debug(f\"Could not import GR00T N1.5 simulation_io: {exc}\")_convert_isaaclab_obs_to_gr00t (Observation Conversion)
- ✅ Good: Optional
scale/offsetsupport enables unit conversions between sim and checkpoint training units - ✅ Good: Configurable
language_keyaccommodates different checkpoint naming conventions - ✅ Good: Backward compatible - defaults to
annotation.human.action.task_description
_convert_gr00t_to_isaaclab_action (Action Conversion)
- ✅ Good: Configurable
gr00t_action_keysordering provides flexibility - ✅ Good: Handles both
action.*prefixed and unprefixed keys robustly - ✅ Good: Clear
KeyErrormessage when expected keys are missing - ✅ Good: Optional
scale/offsetfor action space transformations
🔍 Observations
- Backward Compatibility: All defaults preserve the prior N1.5 behavior ✅
- Graceful Degradation: N1.6/N1.7 registrations are silent no-ops when modules are absent ✅
- Testing: Author verified with end-to-end GR00T N1.7 PPO training (50 epochs, stable losses)
- No new warnings: Pre-commit checks pass
📊 Verdict
LGTM 👍 - Clean, well-documented feature addition with proper error handling and backward compatibility. The minor suggestion above is non-blocking.
Automated review by Isaac Lab Review Bot 🦾
Update (commit 752f047): New commits add --rl_model_path CLI support, RL-finetuned weight loading, a warning for missing action keys, and extensive documentation/test updates unrelated to this PR's core feature.
except ImportError): Not addressed — non-blocking, original comment stands.
🔴 New issue in extension.py → _convert_gr00t_to_isaaclab_action: The padding/scale/offset block appears corrupted — there is an incomplete np.pad( call (no closing parenthesis or arguments) immediately followed by the scale/offset logic, then the original padding block is duplicated below. This will likely cause a SyntaxError or produce incorrect action transformations. The intended order (pad → scale → offset) needs to be restored as a single coherent block.
Update (commit 4d238ab):
✅ Fixed: The broken/duplicate padding block in _convert_gr00t_to_isaaclab_action has been corrected — np.pad now has proper arguments and the pad → scale → offset ordering is clean and coherent.
except ImportError suggestion — original comment stands.
No new issues introduced in this commit.
Greptile SummaryThis PR extends the IsaacLab\u2194GR00T converter registration to cover N1.6 and N1.7 embodiment modules (previously only N1.5 was registered), and makes the obs/action converters config-driven with optional
Confidence Score: 3/5The action converter has two logic defects in newly added code that would produce wrong robot actions when both padding and offset, or a partially-matching key list, are configured. The action converter applies scale/offset after zero-padding, meaning any non-zero offset corrupts the padded joint positions. Separately, a configured gr00t_action_keys entry absent from the action chunk is silently dropped, shrinking the output tensor without any warning. Both defects are on the hot path for every action step. source/isaaclab_contrib/isaaclab_contrib/rl/rlinf/extension.py — specifically the action-converter transform ordering and the missing-key handling in _convert_gr00t_to_isaaclab_action. Important Files Changed
|
…fig-driven mapping The isaaclab_contrib RLinf GR00T integration only registered its obs/action converters on `rlinf.models.embodiment.gr00t` (GR00T N1.5). RLinf also ships `gr00t_n1d6` (N1.6) and `gr00t_n1d7` (N1.7, Cosmos-Reason2-2B / Qwen3-VL) embodiment modules, each with its own simulation_io OBS/ACTION_CONVERSION table. - `_register_gr00t_converters`: register the converters on every available GR00T embodiment module (gr00t, gr00t_n1d6, gr00t_n1d7), each guarded by a try/except import, instead of only gr00t. - `_convert_isaaclab_obs_to_gr00t`: support optional per-state-group `scale`/ `offset` and a configurable `language_key` (N1.6/N1.7 checkpoints commonly use `annotation.human.task_description`). - `_convert_gr00t_to_isaaclab_action`: honor a configured `gr00t_action_keys` ordering and optional action `scale`/`offset`, and accept decoded action keys with or without the `action.` prefix. Additive and backward-compatible: defaults preserve the prior N1.5 behavior, and the N1.6/N1.7 registrations are no-ops when those RLinf modules are absent. Verified with a GR00T N1.7 (Cosmos-Reason2-2B) PPO run on a custom SO-101 task via `scripts/reinforcement_learning/train.py --rl_library rlinf` (multi-env, multi-batch, finite losses across 50 epochs).
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Johnny <johnnync13@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Signed-off-by: Johnny <johnnync13@gmail.com>
Restore pad-then-scale/offset order after a failed merge left an incomplete np.pad call that caused a SyntaxError.
752f047 to
4d238ab
Compare
Description
The
isaaclab_contribRLinf GR00T integration (source/isaaclab_contrib/isaaclab_contrib/rl/rlinf/extension.py) registered its IsaacLab↔GR00T obs/action converters only onrlinf.models.embodiment.gr00t(GR00T N1.5). RLinf also shipsgr00t_n1d6(N1.6) andgr00t_n1d7(N1.7, Cosmos‑Reason2‑2B / Qwen3‑VL) embodiment modules, each with its ownsimulation_io.OBS_CONVERSION/ACTION_CONVERSIONtable — so GR00T 1.6/1.7 RL never received the IsaacLab converters and failed to findobs_converter_typeat runtime.This PR makes the converter registration GR00T‑version aware and the converters config‑driven:
_register_gr00t_convertersnow registers the converters on every available GR00T embodiment module —gr00t,gr00t_n1d6,gr00t_n1d7— each guarded by atry/exceptimport, instead of onlygr00t._convert_isaaclab_obs_to_gr00tsupports optional per‑state‑groupscale/offset(unit conversion between sim and the checkpoint's training units) and a configurablelanguage_key(N1.6/N1.7 checkpoints commonly useannotation.human.task_descriptionrather than the LIBEROannotation.human.action.task_description)._convert_gr00t_to_isaaclab_actionhonors a configuredgr00t_action_keysordering and optional actionscale/offset, and accepts decoded action keys with or without theaction.prefix.All of this is read from the existing
env.train.isaaclab.gr00t_mapping/action_mappingconfig blocks, so no per‑robot Python converter is needed.Type of change
This is additive and backward‑compatible: defaults reproduce the prior N1.5 behavior, and the N1.6/N1.7 registrations are silent no‑ops when those RLinf modules are not installed.
Screenshots / testing
Verified end‑to‑end with a GR00T N1.7 (Cosmos‑Reason2‑2B) PPO run on a custom SO‑101 manipulation task via
scripts/reinforcement_learning/train.py --rl_library rlinf(FSDP actor + HuggingFace rollout): multi‑env (4) and multi‑batch (global_batch_size=4) training, 50 epochs, finite advantages / policy loss / grad norm throughout, value loss decreasing. The same path also exercisesgr00t/gr00t_n1d6unchanged.Checklist
pre-commitchecks with./isaaclab.sh --format(formatting unchanged; edit is import/branch logic)