A Python library for loading and processing MCAP data files in a way that is more suitable for machine learning and robotics training pipelines.
- Dataset-style APIs for iterating MCAP data as episodes/samples
- Built-in statistics utilities (dataset-level and episode-level)
- Convenient access to topics and attachments
- Integration CLI for training with LeRobot using MCAP as the dataset backend
Install from PyPI:
pip install mcap-data-loaderOr install from source:
git clone https://github.com/OpenGHz/MCAP-DataLoader.git --depth 1
cd MCAP-DataLoader
pip install -e .A basic example showing how to load MCAP files from a directory, inspect statistics, and iterate through episodes/samples:
from mcap_data_loader.datasets.mcap_dataset import (
McapFlatBuffersEpisodeDataset,
McapFlatBuffersEpisodeDatasetConfig,
)
from pprint import pprint
dataset = McapFlatBuffersEpisodeDataset(
McapFlatBuffersEpisodeDatasetConfig(
data_root="data/example",
# keys typically include topic names and optional special fields (e.g. "log_stamps")
keys=["/follow/arm/joint_state/position", "log_stamps"],
)
)
print(f"All files: {dataset.all_files}")
print(f"Dataset length: {len(dataset)}")
print("Dataset statistics:")
pprint(dataset.statistics())
for episode in dataset:
print(f"Current file: {episode.config.data_root}")
for sample in episode:
print(f"Sample keys: {sample.keys()}")
break
print(f"Episode length: {len(episode)}")
print(f"All topics: {episode.reader.all_topic_names()}")
print(f"All attachments: {episode.reader.all_attachment_names()}")
print("Episode statistics:")
pprint(episode.statistics())
print("----" * 10)More examples and detailed usage can be found in the examples directory.
MCAP Data Loader provides a CLI to train LeRobot models using MCAP data files. This allows you to use MCAP datasets directly as the training data source for LeRobot, without needing to convert them into a different format.
You should have LeRobot installed in your environment to use this feature. You can install it from PyPI (0.4.3 is tested):
pip install lerobotRun:
mcap_lerobot_train -c configs/config.yamlRecommended: place your config file under a configs/ directory in your current working directory.
The top level is the standard LeRobot configuration, with an additional mcap section for MCAP dataset loading settings:
batch_size: 2
num_workers: 1
policy:
type: act
push_to_hub: false
chunk_size: 2
n_action_steps: 2
dataset:
root: data
repo_id: example
streaming: true
mcap:
states:
- /follow/arm/joint_state/position
- /follow/eef/joint_state/position
actions:
- /lead/arm/pose/position
- /lead/arm/pose/orientation
images:
- /env_camera/color/image_rawThe lists of topics specified by states and actions will be loaded and concatenated to form the observation.state and action required by lerobot, serving as low-dimensional state and action inputs in the training data. Meanwhile, images will be appended to the observation.images field, using the first part of the name (e.g., env_camera in the example above) as a suffix for image input, such as observation.images.env_camera, for use during training.
For processed data, MCAP is better suited to creating a new file that contains only the processed topics, rather than appending processed data back into the original file. For an example of generating processed topics, see Data Processing.
During training, you can specify both the original dataset directory and the processed dataset directory at the same time. MCAP Data Loader will merge them automatically at runtime, so they can be consumed as if they were read from a single dataset.
A typical configuration looks like this:
dataset:
root: data
repo_id:
- mujoco
- mujoco_processed
streaming: trueNotes:
dataset.rootanddataset.repo_idare reused to specify the MCAP dataset root directory and dataset name.- Command-line overrides compatible with LeRobot are supported and take the highest priority (they override values in the config file). For example:
mcap_lerobot_train -c configs/config.yaml --dataset.repo_id=example_task
If you want to use LeRobot’s original data format (while still using this CLI), add --ori:
mcap_lerobot_train -c configs/ori.yaml --oriMake sure the dataset path in your config points to the actual LeRobot dataset location.
Show supported parameters:
mcap_lerobot_train -hIf the output is long, redirect to a file:
mcap_lerobot_train -h > lerobot_help.txtFor pose-topic post-processing, see docs/poses.md.
The script mcap_data_loader/scripts/data_process/poses.py can be used to generate:
- relative pose topics with
_relasuffix rotation_6dtopics converted from quaternion pose topics
Example:
python mcap_data_loader/scripts/data_process/poses.py \
data/example \
--keys /follow/arm/pose/position /follow/arm/pose/orientation \
--targets rela rotation_6dSee LICENSE.