Feature: add VLA policy and registry for RL by yangchen73 · Pull Request #186 · DexForce/EmbodiChain

yangchen73 · 2026-03-16T09:29:09Z

Description

Add vla_policy wrapper to integrate VLA model into RL policies.
Extend RL policy and training loop to support raw observations and chunked actions for VLA.
Introduce vla_registry to discover VLA-related factories via entry points.

Type of change

Enhancement (non-breaking change which improves an existing functionality)
New feature (non-breaking change which adds functionality)

Checklist

I have run the black . command to format the code base.
I have made corresponding changes to the documentation
I have added tests that prove my fix is effective or that my feature works
Dependencies have been updated, if applicable.

Copilot

Pull request overview

Adds support for integrating a VLA (vision-language-action) model into the existing RL stack by introducing a new VLAPolicy, wiring raw (hierarchical) observations + chunked actions through collection/eval/training, and adding an entry-point based backend registry.

Changes:

Introduce VLAPolicy and register it in the RL policy registry.
Extend rollout collection/training (collector, buffer, GRPO, trainer eval) to support raw observations and action chunks (action_chunk / chunk_step).
Add vla_registry to discover VLA backend factories via Python entry points.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 10 comments.

Show a summary per file

File	Description
`embodichain/agents/rl/vla_registry.py`	New entry-point based backend registry + factory creation.
`embodichain/agents/rl/models/vla_policy.py`	New `VLAPolicy` wrapper for VLA inference + GRPO-compatible `evaluate_actions`.
`embodichain/agents/rl/models/__init__.py`	Registers `vla_policy`; extends `build_policy` to optionally pass env/policy_cfg.
`embodichain/agents/rl/collector/sync_collector.py`	Adds raw-observation storage and action-chunk caching + `chunk_step`.
`embodichain/agents/rl/buffer/standard_buffer.py`	Adds `use_raw_obs` and attaches `raw_obs` list to shared rollout.
`embodichain/agents/rl/buffer/utils.py`	Propagates `chunk_step` into transition view; adds `_indices` in minibatches.
`embodichain/agents/rl/algo/grpo.py`	Passes `rollout` + `num_envs` into `evaluate_actions`; preserves raw fields across clone.
`embodichain/agents/rl/utils/trainer.py`	Adjusts buffer sizing for chunked actions; updates eval loop for raw obs/chunks.
`embodichain/agents/rl/models/actor_only.py`	Updates `evaluate_actions` signature to accept extra kwargs.
`embodichain/agents/rl/models/actor_critic.py`	Updates `evaluate_actions` signature to accept extra kwargs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

embodichain/agents/rl/models/__init__.py

embodichain/agents/rl/utils/trainer.py

embodichain/agents/rl/buffer/utils.py

embodichain/agents/rl/models/vla_policy.py

embodichain/agents/rl/vla_registry.py

embodichain/agents/rl/collector/sync_collector.py

embodichain/agents/rl/models/vla_policy.py

Copilot

Pull request overview

Adds first-class support for VLA-backed policies in the RL stack by introducing a VLA policy wrapper, an entry-point-based backend registry, and rollout/collector plumbing for raw observations + chunked actions.

Changes:

Introduces VLAPolicy and registers it in the RL policy registry.
Adds vla_registry to discover/load VLA backend factories via Python entry points.
Extends rollout collection/training/eval utilities to support raw_obs, chunk_step, and action-chunk caching.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 14 comments.

Show a summary per file

File	Description
`embodichain/agents/rl/vla_registry.py`	Entry-point discovery + factory creation for pluggable VLA backends.
`embodichain/agents/rl/utils/trainer.py`	Trainer buffer allocation + eval loop updated for raw obs and action chunks.
`embodichain/agents/rl/models/vla_policy.py`	New VLA-backed policy wrapper implementing chunked action inference and proxy log-prob evaluation.
`embodichain/agents/rl/models/actor_only.py`	Broadens `evaluate_actions` signature to accept extra kwargs.
`embodichain/agents/rl/models/actor_critic.py`	Broadens `evaluate_actions` signature to accept extra kwargs.
`embodichain/agents/rl/models/__init__.py`	Registers `vla_policy` and adds env-dependent initialization path in `build_policy`.
`embodichain/agents/rl/collector/sync_collector.py`	Adds `raw_obs` storage + action chunk caching + `chunk_step` tracking.
`embodichain/agents/rl/buffer/utils.py`	Propagates `chunk_step` into transition view and adds minibatch `_indices`.
`embodichain/agents/rl/buffer/standard_buffer.py`	Adds `use_raw_obs` handling and allocates `rollout.raw_obs`.
`embodichain/agents/rl/algo/grpo.py`	Passes rollout context into `evaluate_actions` and preserves rollout attributes across clone.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

embodichain/agents/rl/collector/sync_collector.py

embodichain/agents/rl/models/__init__.py

embodichain/agents/rl/vla_registry.py

embodichain/agents/rl/models/vla_policy.py

embodichain/agents/rl/models/__init__.py

embodichain/agents/rl/buffer/utils.py

embodichain/agents/rl/utils/trainer.py

embodichain/agents/rl/algo/grpo.py

embodichain/agents/rl/buffer/standard_buffer.py

yuecideng · 2026-03-29T15:40:31Z

embodichain/agents/rl/collector/sync_collector.py

+
+        if use_raw_obs:
+            if raw_obs_list is None:
+                raise ValueError(


use logger.error

embodichain/agents/rl/models/vla_policy.py

Copilot

Pull request overview

Adds VLA (Vision-Language-Action) integration into the RL stack by introducing a VLAPolicy wrapper, extending rollout collection/training to support raw observations and chunked actions, and adding a registry for VLA backends via entry points.

Changes:

Introduce VLAPolicy and vla_registry to load and run VLA backends inside RL policies.
Extend rollout collection/evaluation to support use_raw_obs and chunked actions (action_chunk + chunk_step).
Adjust minibatching/GRPO plumbing to pass rollout context (raw_obs, indices) into evaluate_actions.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 11 comments.

Show a summary per file

File	Description
embodichain/agents/rl/vla_registry.py	Adds entry-point-based backend discovery + factory creation for VLA backends.
embodichain/agents/rl/models/vla_policy.py	New policy wrapper that runs a VLA backend and exposes RL `Policy` interface with action chunks + raw obs.
embodichain/agents/rl/utils/trainer.py	Updates buffer sizing and evaluation loop to handle raw obs + chunked actions.
embodichain/agents/rl/train.py	Passes `env` into `build_policy` for VLA policy initialization.
embodichain/agents/rl/models/init.py	Registers `vla_policy` and extends `build_policy` to support env/policy_cfg and VLA initialization.
embodichain/agents/rl/collector/sync_collector.py	Extends collector to populate `raw_obs`, generate/consume chunked actions, and track `chunk_step`.
embodichain/agents/rl/buffer/utils.py	Propagates `chunk_step` into transition view; adds `_indices` to minibatches for mapping back to rollout.
embodichain/agents/rl/buffer/standard_buffer.py	Allocates/clears `raw_obs` and `chunk_step` dynamic fields for VLA workflows.
embodichain/agents/rl/algo/grpo.py	Passes `rollout` into `evaluate_actions` to support VLA log-prob evaluation from raw obs.
embodichain/agents/rl/algo/ppo.py	Removes per-update rollout cloning (now relies on shared rollout lifecycle).
embodichain/agents/rl/models/actor_only.py	Allows `evaluate_actions(..., **kwargs)` to accept rollout context without breaking.
embodichain/agents/rl/models/actor_critic.py	Allows `evaluate_actions(..., **kwargs)` to accept rollout context without breaking.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

embodichain/agents/rl/models/vla_policy.py

embodichain/agents/rl/collector/sync_collector.py

embodichain/agents/rl/buffer/utils.py

embodichain/agents/rl/utils/trainer.py

embodichain/agents/rl/vla_registry.py

embodichain/agents/rl/collector/sync_collector.py

embodichain/agents/rl/vla_registry.py

embodichain/agents/rl/models/__init__.py

embodichain/agents/rl/vla_registry.py

Copilot

Pull request overview

Copilot reviewed 13 out of 13 changed files in this pull request and generated 10 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

embodichain/lab/gym/envs/managers/randomization/visual.py

embodichain/agents/rl/vla_registry.py

embodichain/agents/rl/utils/trainer.py

embodichain/agents/rl/collector/sync_collector.py

embodichain/agents/rl/buffer/utils.py

embodichain/agents/rl/models/vla_policy.py

embodichain/agents/rl/collector/sync_collector.py

embodichain/agents/rl/vla_registry.py

embodichain/agents/rl/utils/trainer.py

Copilot

Pull request overview

Copilot reviewed 17 out of 17 changed files in this pull request and generated 6 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

embodichain/agents/rl/collector/sync_collector.py

embodichain/agents/rl/buffer/standard_buffer.py

tests/agents/test_shared_rollout.py

yangchen73 added 7 commits March 16, 2026 03:35

Support raw obs and chunk action for VLA

8abab90

Merge remote-tracking branch 'origin' into yc/vla_rl

02d2c02

Extend policy for vla

d5a0684

Implement a registry to find factory in dexechain from entry point

662a53d

Update registry

7fe96d0

Update registry

7b2287f

Add vla_policy to wrap vla model

af373f2

Copilot AI review requested due to automatic review settings March 16, 2026 09:29

Copilot started reviewing on behalf of yangchen73 March 16, 2026 09:29 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

yuecideng self-requested a review March 16, 2026 09:46

Update

1b28105

yangchen73 requested a review from Copilot March 16, 2026 09:47

Copilot started reviewing on behalf of yangchen73 March 16, 2026 09:48 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

WIP

79e5840

yuecideng requested changes Mar 29, 2026

View reviewed changes

yangchen73 added 2 commits March 30, 2026 03:00

Merge remote-tracking branch 'origin/main' into yc/vla_rl

f2cd81a

wip

10b2b7b

Copilot AI review requested due to automatic review settings March 30, 2026 03:26

Copilot started reviewing on behalf of yangchen73 March 30, 2026 03:26 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

yangchen73 added 6 commits March 30, 2026 10:00

Merge remote-tracking branch 'origin/main' into yc/vla_rl

860b9b2

fix: device conflict when randomize

712e730

Support action chunk

6aad3f8

fix

081c728

update

62c0de8

reformat files

d5ce609

Copilot AI review requested due to automatic review settings March 31, 2026 08:33

Copilot started reviewing on behalf of yangchen73 March 31, 2026 08:34 View session

Copilot AI reviewed Mar 31, 2026

View reviewed changes

yuecideng requested changes Mar 31, 2026

View reviewed changes

yangchen73 added 3 commits April 1, 2026 03:43

Merge remote-tracking branch 'origin/main' into yc/vla_rl

4cd69fa

wip

5076ebe

refactor raw observation rollout handling

42b431b

Copilot AI review requested due to automatic review settings April 1, 2026 12:55

Copilot started reviewing on behalf of yangchen73 April 1, 2026 12:56 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

wip

1c8e136

Conversation

yangchen73 commented Mar 16, 2026

Description

Type of change

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuecideng Mar 29, 2026

Choose a reason for hiding this comment

Uh oh!

yangchen73 Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!