feat(agents): Add hierarchical language support for VLA training by yuecideng · Pull Request #264 · DexForce/EmbodiChain

yuecideng · 2026-05-12T08:37:38Z

Description

This PR adds comprehensive language support to Online Data Streaming (ODS) for Vision-Language-Action (VLA) model training. The implementation enables VLA models to learn from multi-scale language representations similar to human task understanding.

Key Features

Hierarchical Language Structure: Organizes instructions at three abstraction levels:
- Task level: High-level goal or overall task description
- Subtask level: Intermediate step descriptions
- Primitive level: Low-level action descriptions
Multiple Language Sources:
- File-based: Load task descriptions from YAML/JSON files
- Environment-based: Generate language from the environment
- Template-based: Use templates with variable substitution
- LLM-based: Generate descriptions using GPT/Claude (optional)
Flexible Storage: Supports tokens, embeddings, or hybrid storage modes
LanguageManager: Handles tokenization and language data management with support for:
- Curriculum learning (progressive complexity)
- Data augmentation (optional)
- Multiple tokenizer backends (HuggingFace, OpenAI/tiktoken)

Changes

New files:

embodichain/lab/gym/envs/managers/language.py - LanguageManager, configs, and data structures
embodichain/lab/gym/envs/managers/language_provider.py - Language providers for different sources
configs/language/ - Example configurations, documentation, and usage examples
tests/agents/test_language_support.py - Test suite (7 passed, 4 skipped due to optional dependencies)

Modified files:

embodichain/agents/engine/data.py - Added language_cfg to OnlineDataEngineCfg and buffer creation
embodichain/lab/gym/envs/embodied_env.py - Integrated LanguageManager and language data writing
embodichain/lab/gym/utils/gym_utils.py - Extended init_rollout_buffer_from_config to allocate language fields
embodichain/lab/gym/envs/managers/__init__.py - Exported new language classes

Buffer Structure

When language support is enabled, the rollout buffer includes:

{level}_tokens: Token IDs for each hierarchy level
{level}_attention_mask: Attention masks for padding
{level}_count: Number of instructions per level
instruction_counts: Counts across all levels
change_points: Timesteps where language changes
hierarchy_depth: Current depth of hierarchy (1-3)
instruction_types: Instruction type IDs

Usage Example

language_cfg = {
    "mode": "tokens",
    "hierarchy_levels": ["task", "subtask", "primitive"],
    "max_tokens": 512,
    "tokenizer": "gpt2",
    "language_source": "file",
    "language_config_path": "configs/language/tasks_example.yaml",
}

engine_cfg = OnlineDataEngineCfg(
    buffer_size=16,
    max_episode_steps=300,
    state_dim=14,
    gym_config={...},
    language_cfg=language_cfg,
)

engine = OnlineDataEngine(engine_cfg)
engine.start()

# Access language data
for batch in dataset:
    language = batch["language"]
    task_tokens = language["task_level_tokens"]
    subtask_tokens = language["subtask_level_tokens"]
    primitive_tokens = language["primitive_level_tokens"]

Type of change

Bug fix (non-breaking change which fixes an issue)
Enhancement (non-breaking change which improves an existing functionality)
New feature (non-breaking change which adds functionality)
Breaking change (existing functionality will not work without user modification)
Documentation update

Screenshots

N/A

Checklist

I have run the black . command to format the code base.
I have made corresponding changes to the documentation (added README.md and usage examples)
I have added tests that prove my fix is effective or that my feature works
Dependencies have been updated (optional - transformers/tiktoken are optional for full functionality)

🤖 Generated with Claude Code

Add comprehensive language support to Online Data Streaming (ODS) for Vision-Language-Action (VLA) model training. The implementation provides: - Hierarchical language structure (task/subtask/primitive levels) - Multiple language sources (file, env, template, LLM) - Flexible storage modes (tokens, embeddings, hybrid) - LanguageManager for tokenization and data management - Integration with ODS shared memory buffer New files: - embodichain/lab/gym/envs/managers/language.py: LanguageManager, configs - embodichain/lab/gym/envs/managers/language_provider.py: Language providers - configs/language/: Example configurations and documentation - tests/agents/test_language_support.py: Test suite Modified files: - embodichain/agents/engine/data.py: Add language_cfg to OnlineDataEngine - embodichain/lab/gym/envs/embodied_env.py: Integrate LanguageManager - embodichain/lab/gym/utils/gym_utils.py: Extend buffer initialization - embodichain/lab/gym/envs/managers/__init__.py: Export language classes This enables VLA models to learn from multi-scale language representations similar to human task understanding. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR introduces hierarchical language support for Vision-Language-Action (VLA) training by extending the Online Data Streaming (ODS) rollout buffer schema and integrating language generation/tokenization into EmbodiedEnv.

Changes:

Adds LanguageCfg, LanguageManager, and hierarchical language data structures.
Adds language providers (file/env/template/LLM) and wires language collection into environment episode initialization.
Extends rollout buffer initialization to optionally allocate a language TensorDict and adds a test suite + example configs/docs.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 20 comments.

Show a summary per file

File	Description
`embodichain/lab/gym/utils/gym_utils.py`	Adds `_init_language_buffer()` and extends `init_rollout_buffer_from_config()` to allocate language fields.
`embodichain/lab/gym/envs/managers/language.py`	Implements language config + tokenization + hierarchical data formatting for buffers.
`embodichain/lab/gym/envs/managers/language_provider.py`	Adds language sources/providers (file/env/template/LLM).
`embodichain/lab/gym/envs/embodied_env.py`	Integrates language manager/provider and writes language into the rollout buffer at episode init.
`embodichain/agents/engine/data.py`	Plumbs `language_cfg` through ODS config into env config and buffer allocation.
`embodichain/lab/gym/envs/managers/__init__.py`	Re-exports new language modules.
`tests/agents/test_language_support.py`	Adds unit tests for language data structures/buffer init/providers.
`configs/language/README.md`	Documents configuration and buffer layout.
`configs/language/tasks_example.yaml`	Adds example hierarchical task descriptions.
`configs/language/usage_example.py`	Adds end-to-end usage examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+        # Token IDs: [batch_size, max_episode_steps, max_instructions, max_tokens]
+        language_desc[f"{level_key}_tokens"] = torch.zeros(
+            (batch_size, max_episode_steps, max_instructions, max_tokens),
+            dtype=torch.int64,
+            device=device,
+        )


        if self.cfg.init_rollout_buffer:
+            # Determine if we need to initialize language fields
+            language_cfg = self.cfg.language if self.cfg.language else None
            self.rollout_buffer = init_rollout_buffer_from_gym_space(
                obs_space=self.observation_space,


+            # Write instruction count
+            count = buffer_format.get(f"{level_key}_count", torch.tensor([0]))
+            level_idx = {"task": 0, "subtask": 1, "primitive": 2}[level]
+            self.rollout_buffer["language"]["instruction_counts"][
+                env_ids, :, level_idx
+            ] = count.item()
+


+            # Stack instructions
+            result[f"{level_key}_tokens"] = torch.stack(padded_tokens)
+            result[f"{level_key}_attention_mask"] = torch.stack(padded_masks)


+                    # Empty instruction
+                    tokens = torch.full(
+                        (cfg.max_tokens,),
+                        cfg.pad_token_id,
+                        dtype=torch.int64,
+                        device="cpu",
+                    )
+                    mask = torch.zeros(
+                        (cfg.max_tokens,),
+                        dtype=torch.int64,
+                        device="cpu",
+                    )


+        # Would need LanguageManager to tokenize - return placeholder
+        return HierarchicalLanguageData(
+            task_level=[],  # Would be populated with LanguageData objects
+            subtask_level=[],
+            primitive_level=[],
+            change_points=change_points,
+        )


+                tokens, mask = self.tokenize(text)
+                return LanguageData(tokens=tokens, attention_mask=mask, raw_text=text)
+
+        temp_mgr = _TempManager(self.cfg)


+        log_info(
+            f"[init_rollout_buffer_from_config] Language buffer added with hierarchy levels: {language_cfg.get('hierarchy_levels', ['task', 'subtask', 'primitive'])}"
+        )


+            self.language_manager = LanguageManager(language_cfg, self)
+            log_info(
+                f"[EmbodiedEnv] LanguageManager initialized with source={language_source}, "
+                f"mode={language_cfg.mode}, hierarchy={language_cfg.hierarchy_levels}"
+            )


+        # Write hierarchy depth
+        hierarchy_depth = language_data.hierarchy_depth
+        self.rollout_buffer["language"]["hierarchy_depth"][env_ids, :] = hierarchy_depth
+


yuecideng added enhancement New feature or request agent Features related to agentic system data Related to data module gym robot learning env and its related features labels May 12, 2026

yuecideng marked this pull request as draft May 12, 2026 08:47

yuecideng requested a review from Copilot May 24, 2026 05:41

Copilot started reviewing on behalf of yuecideng May 24, 2026 05:41 View session

Copilot AI reviewed May 24, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(agents): Add hierarchical language support for VLA training#264

feat(agents): Add hierarchical language support for VLA training#264
yuecideng wants to merge 1 commit into
mainfrom
feat/vla-language-support

yuecideng commented May 12, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yuecideng commented May 12, 2026

Description

Key Features

Changes

Buffer Structure

Usage Example

Type of change

Screenshots

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants