Skip to content

feat(agents): Add hierarchical language support for VLA training#264

Draft
yuecideng wants to merge 1 commit into
mainfrom
feat/vla-language-support
Draft

feat(agents): Add hierarchical language support for VLA training#264
yuecideng wants to merge 1 commit into
mainfrom
feat/vla-language-support

Conversation

@yuecideng
Copy link
Copy Markdown
Contributor

Description

This PR adds comprehensive language support to Online Data Streaming (ODS) for Vision-Language-Action (VLA) model training. The implementation enables VLA models to learn from multi-scale language representations similar to human task understanding.

Key Features

  • Hierarchical Language Structure: Organizes instructions at three abstraction levels:

    • Task level: High-level goal or overall task description
    • Subtask level: Intermediate step descriptions
    • Primitive level: Low-level action descriptions
  • Multiple Language Sources:

    • File-based: Load task descriptions from YAML/JSON files
    • Environment-based: Generate language from the environment
    • Template-based: Use templates with variable substitution
    • LLM-based: Generate descriptions using GPT/Claude (optional)
  • Flexible Storage: Supports tokens, embeddings, or hybrid storage modes

  • LanguageManager: Handles tokenization and language data management with support for:

    • Curriculum learning (progressive complexity)
    • Data augmentation (optional)
    • Multiple tokenizer backends (HuggingFace, OpenAI/tiktoken)

Changes

New files:

  • embodichain/lab/gym/envs/managers/language.py - LanguageManager, configs, and data structures
  • embodichain/lab/gym/envs/managers/language_provider.py - Language providers for different sources
  • configs/language/ - Example configurations, documentation, and usage examples
  • tests/agents/test_language_support.py - Test suite (7 passed, 4 skipped due to optional dependencies)

Modified files:

  • embodichain/agents/engine/data.py - Added language_cfg to OnlineDataEngineCfg and buffer creation
  • embodichain/lab/gym/envs/embodied_env.py - Integrated LanguageManager and language data writing
  • embodichain/lab/gym/utils/gym_utils.py - Extended init_rollout_buffer_from_config to allocate language fields
  • embodichain/lab/gym/envs/managers/__init__.py - Exported new language classes

Buffer Structure

When language support is enabled, the rollout buffer includes:

  • {level}_tokens: Token IDs for each hierarchy level
  • {level}_attention_mask: Attention masks for padding
  • {level}_count: Number of instructions per level
  • instruction_counts: Counts across all levels
  • change_points: Timesteps where language changes
  • hierarchy_depth: Current depth of hierarchy (1-3)
  • instruction_types: Instruction type IDs

Usage Example

language_cfg = {
    "mode": "tokens",
    "hierarchy_levels": ["task", "subtask", "primitive"],
    "max_tokens": 512,
    "tokenizer": "gpt2",
    "language_source": "file",
    "language_config_path": "configs/language/tasks_example.yaml",
}

engine_cfg = OnlineDataEngineCfg(
    buffer_size=16,
    max_episode_steps=300,
    state_dim=14,
    gym_config={...},
    language_cfg=language_cfg,
)

engine = OnlineDataEngine(engine_cfg)
engine.start()

# Access language data
for batch in dataset:
    language = batch["language"]
    task_tokens = language["task_level_tokens"]
    subtask_tokens = language["subtask_level_tokens"]
    primitive_tokens = language["primitive_level_tokens"]

Type of change

  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (non-breaking change which improves an existing functionality)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (existing functionality will not work without user modification)
  • Documentation update

Screenshots

N/A

Checklist

  • I have run the black . command to format the code base.
  • I have made corresponding changes to the documentation (added README.md and usage examples)
  • I have added tests that prove my fix is effective or that my feature works
  • Dependencies have been updated (optional - transformers/tiktoken are optional for full functionality)

🤖 Generated with Claude Code

Add comprehensive language support to Online Data Streaming (ODS) for
Vision-Language-Action (VLA) model training. The implementation provides:

- Hierarchical language structure (task/subtask/primitive levels)
- Multiple language sources (file, env, template, LLM)
- Flexible storage modes (tokens, embeddings, hybrid)
- LanguageManager for tokenization and data management
- Integration with ODS shared memory buffer

New files:
- embodichain/lab/gym/envs/managers/language.py: LanguageManager, configs
- embodichain/lab/gym/envs/managers/language_provider.py: Language providers
- configs/language/: Example configurations and documentation
- tests/agents/test_language_support.py: Test suite

Modified files:
- embodichain/agents/engine/data.py: Add language_cfg to OnlineDataEngine
- embodichain/lab/gym/envs/embodied_env.py: Integrate LanguageManager
- embodichain/lab/gym/utils/gym_utils.py: Extend buffer initialization
- embodichain/lab/gym/envs/managers/__init__.py: Export language classes

This enables VLA models to learn from multi-scale language representations
similar to human task understanding.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@yuecideng yuecideng added enhancement New feature or request agent Features related to agentic system data Related to data module gym robot learning env and its related features labels May 12, 2026
@yuecideng yuecideng marked this pull request as draft May 12, 2026 08:47
@yuecideng yuecideng requested a review from Copilot May 24, 2026 05:41
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces hierarchical language support for Vision-Language-Action (VLA) training by extending the Online Data Streaming (ODS) rollout buffer schema and integrating language generation/tokenization into EmbodiedEnv.

Changes:

  • Adds LanguageCfg, LanguageManager, and hierarchical language data structures.
  • Adds language providers (file/env/template/LLM) and wires language collection into environment episode initialization.
  • Extends rollout buffer initialization to optionally allocate a language TensorDict and adds a test suite + example configs/docs.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 20 comments.

Show a summary per file
File Description
embodichain/lab/gym/utils/gym_utils.py Adds _init_language_buffer() and extends init_rollout_buffer_from_config() to allocate language fields.
embodichain/lab/gym/envs/managers/language.py Implements language config + tokenization + hierarchical data formatting for buffers.
embodichain/lab/gym/envs/managers/language_provider.py Adds language sources/providers (file/env/template/LLM).
embodichain/lab/gym/envs/embodied_env.py Integrates language manager/provider and writes language into the rollout buffer at episode init.
embodichain/agents/engine/data.py Plumbs language_cfg through ODS config into env config and buffer allocation.
embodichain/lab/gym/envs/managers/__init__.py Re-exports new language modules.
tests/agents/test_language_support.py Adds unit tests for language data structures/buffer init/providers.
configs/language/README.md Documents configuration and buffer layout.
configs/language/tasks_example.yaml Adds example hierarchical task descriptions.
configs/language/usage_example.py Adds end-to-end usage examples.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +995 to +1000
# Token IDs: [batch_size, max_episode_steps, max_instructions, max_tokens]
language_desc[f"{level_key}_tokens"] = torch.zeros(
(batch_size, max_episode_steps, max_instructions, max_tokens),
dtype=torch.int64,
device=device,
)
Comment on lines 367 to 371
if self.cfg.init_rollout_buffer:
# Determine if we need to initialize language fields
language_cfg = self.cfg.language if self.cfg.language else None
self.rollout_buffer = init_rollout_buffer_from_gym_space(
obs_space=self.observation_space,
Comment on lines +776 to +782
# Write instruction count
count = buffer_format.get(f"{level_key}_count", torch.tensor([0]))
level_idx = {"task": 0, "subtask": 1, "primitive": 2}[level]
self.rollout_buffer["language"]["instruction_counts"][
env_ids, :, level_idx
] = count.item()

Comment on lines +390 to +392
# Stack instructions
result[f"{level_key}_tokens"] = torch.stack(padded_tokens)
result[f"{level_key}_attention_mask"] = torch.stack(padded_masks)
Comment on lines +374 to +385
# Empty instruction
tokens = torch.full(
(cfg.max_tokens,),
cfg.pad_token_id,
dtype=torch.int64,
device="cpu",
)
mask = torch.zeros(
(cfg.max_tokens,),
dtype=torch.int64,
device="cpu",
)
Comment on lines +633 to +639
# Would need LanguageManager to tokenize - return placeholder
return HierarchicalLanguageData(
task_level=[], # Would be populated with LanguageData objects
subtask_level=[],
primitive_level=[],
change_points=change_points,
)
tokens, mask = self.tokenize(text)
return LanguageData(tokens=tokens, attention_mask=mask, raw_text=text)

temp_mgr = _TempManager(self.cfg)
Comment on lines +1256 to +1258
log_info(
f"[init_rollout_buffer_from_config] Language buffer added with hierarchy levels: {language_cfg.get('hierarchy_levels', ['task', 'subtask', 'primitive'])}"
)
Comment on lines +345 to +349
self.language_manager = LanguageManager(language_cfg, self)
log_info(
f"[EmbodiedEnv] LanguageManager initialized with source={language_source}, "
f"mode={language_cfg.mode}, hierarchy={language_cfg.hierarchy_levels}"
)
Comment on lines +795 to +798
# Write hierarchy depth
hierarchy_depth = language_data.hierarchy_depth
self.rollout_buffer["language"]["hierarchy_depth"][env_ids, :] = hierarchy_depth

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Features related to agentic system data Related to data module enhancement New feature or request gym robot learning env and its related features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants