Feature: add VLA policy and registry for RL#186
Conversation
There was a problem hiding this comment.
Pull request overview
Adds support for integrating a VLA (vision-language-action) model into the existing RL stack by introducing a new VLAPolicy, wiring raw (hierarchical) observations + chunked actions through collection/eval/training, and adding an entry-point based backend registry.
Changes:
- Introduce
VLAPolicyand register it in the RL policy registry. - Extend rollout collection/training (collector, buffer, GRPO, trainer eval) to support raw observations and action chunks (
action_chunk/chunk_step). - Add
vla_registryto discover VLA backend factories via Python entry points.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
embodichain/agents/rl/vla_registry.py |
New entry-point based backend registry + factory creation. |
embodichain/agents/rl/models/vla_policy.py |
New VLAPolicy wrapper for VLA inference + GRPO-compatible evaluate_actions. |
embodichain/agents/rl/models/__init__.py |
Registers vla_policy; extends build_policy to optionally pass env/policy_cfg. |
embodichain/agents/rl/collector/sync_collector.py |
Adds raw-observation storage and action-chunk caching + chunk_step. |
embodichain/agents/rl/buffer/standard_buffer.py |
Adds use_raw_obs and attaches raw_obs list to shared rollout. |
embodichain/agents/rl/buffer/utils.py |
Propagates chunk_step into transition view; adds _indices in minibatches. |
embodichain/agents/rl/algo/grpo.py |
Passes rollout + num_envs into evaluate_actions; preserves raw fields across clone. |
embodichain/agents/rl/utils/trainer.py |
Adjusts buffer sizing for chunked actions; updates eval loop for raw obs/chunks. |
embodichain/agents/rl/models/actor_only.py |
Updates evaluate_actions signature to accept extra kwargs. |
embodichain/agents/rl/models/actor_critic.py |
Updates evaluate_actions signature to accept extra kwargs. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Adds first-class support for VLA-backed policies in the RL stack by introducing a VLA policy wrapper, an entry-point-based backend registry, and rollout/collector plumbing for raw observations + chunked actions.
Changes:
- Introduces
VLAPolicyand registers it in the RL policy registry. - Adds
vla_registryto discover/load VLA backend factories via Python entry points. - Extends rollout collection/training/eval utilities to support
raw_obs,chunk_step, and action-chunk caching.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 14 comments.
Show a summary per file
| File | Description |
|---|---|
embodichain/agents/rl/vla_registry.py |
Entry-point discovery + factory creation for pluggable VLA backends. |
embodichain/agents/rl/utils/trainer.py |
Trainer buffer allocation + eval loop updated for raw obs and action chunks. |
embodichain/agents/rl/models/vla_policy.py |
New VLA-backed policy wrapper implementing chunked action inference and proxy log-prob evaluation. |
embodichain/agents/rl/models/actor_only.py |
Broadens evaluate_actions signature to accept extra kwargs. |
embodichain/agents/rl/models/actor_critic.py |
Broadens evaluate_actions signature to accept extra kwargs. |
embodichain/agents/rl/models/__init__.py |
Registers vla_policy and adds env-dependent initialization path in build_policy. |
embodichain/agents/rl/collector/sync_collector.py |
Adds raw_obs storage + action chunk caching + chunk_step tracking. |
embodichain/agents/rl/buffer/utils.py |
Propagates chunk_step into transition view and adds minibatch _indices. |
embodichain/agents/rl/buffer/standard_buffer.py |
Adds use_raw_obs handling and allocates rollout.raw_obs. |
embodichain/agents/rl/algo/grpo.py |
Passes rollout context into evaluate_actions and preserves rollout attributes across clone. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| if use_raw_obs: | ||
| if raw_obs_list is None: | ||
| raise ValueError( |
There was a problem hiding this comment.
Pull request overview
Adds VLA (Vision-Language-Action) integration into the RL stack by introducing a VLAPolicy wrapper, extending rollout collection/training to support raw observations and chunked actions, and adding a registry for VLA backends via entry points.
Changes:
- Introduce
VLAPolicyandvla_registryto load and run VLA backends inside RL policies. - Extend rollout collection/evaluation to support
use_raw_obsand chunked actions (action_chunk+chunk_step). - Adjust minibatching/GRPO plumbing to pass rollout context (
raw_obs, indices) intoevaluate_actions.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| embodichain/agents/rl/vla_registry.py | Adds entry-point-based backend discovery + factory creation for VLA backends. |
| embodichain/agents/rl/models/vla_policy.py | New policy wrapper that runs a VLA backend and exposes RL Policy interface with action chunks + raw obs. |
| embodichain/agents/rl/utils/trainer.py | Updates buffer sizing and evaluation loop to handle raw obs + chunked actions. |
| embodichain/agents/rl/train.py | Passes env into build_policy for VLA policy initialization. |
| embodichain/agents/rl/models/init.py | Registers vla_policy and extends build_policy to support env/policy_cfg and VLA initialization. |
| embodichain/agents/rl/collector/sync_collector.py | Extends collector to populate raw_obs, generate/consume chunked actions, and track chunk_step. |
| embodichain/agents/rl/buffer/utils.py | Propagates chunk_step into transition view; adds _indices to minibatches for mapping back to rollout. |
| embodichain/agents/rl/buffer/standard_buffer.py | Allocates/clears raw_obs and chunk_step dynamic fields for VLA workflows. |
| embodichain/agents/rl/algo/grpo.py | Passes rollout into evaluate_actions to support VLA log-prob evaluation from raw obs. |
| embodichain/agents/rl/algo/ppo.py | Removes per-update rollout cloning (now relies on shared rollout lifecycle). |
| embodichain/agents/rl/models/actor_only.py | Allows evaluate_actions(..., **kwargs) to accept rollout context without breaking. |
| embodichain/agents/rl/models/actor_critic.py | Allows evaluate_actions(..., **kwargs) to accept rollout context without breaking. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 10 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Description
vla_policywrapper to integrate VLA model into RL policies.vla_registryto discover VLA-related factories via entry points.Type of change
Checklist
black .command to format the code base.