Bring benchmark code back to the latest code#45
Bring benchmark code back to the latest code#45XinyuZeng wants to merge 2 commits intoBerkeleyAutomation:mainfrom
Conversation
This commit adds VLA, LeRobot loaders and a comprehensive benchmarking script to evaluate loading performance across different robotics data formats (VLA, HDF5, RLDS, LeRobot/HuggingFace). The VLA loader includes both shuffled (with multiprocessing) and non-shuffled variants for flexible data loading workflows. Key additions: - VLALoader: Shuffled loader with multiprocessing and prefetch buffer - NonShuffleVLALoader: Sequential loader for deterministic iteration - LeRobotLoader: Support for HuggingFace-format datasets - benchmarks/openx.py: Performance benchmarking across formats - examples: Format conversion utilities (RLDS->VLA, VLA->HDF5) - HDF5Loader: Added split parameter for train/val splits The benchmark script measures loading times, average trajectory sizes, and per-batch performance metrics with configurable batch sizes and format selection. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
| for batch_num, data in enumerate(loader): | ||
| if batch_num >= self.num_batches: | ||
| break | ||
| # self._recursively_load_data(data) |
There was a problem hiding this comment.
TBH I do not fully understand the reason of this function here, is it just for ensuring the data is correctly loaded (for debugging usage)?
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| class LeRobotLoader(BaseLoader): |
There was a problem hiding this comment.
This code is from the mkv branch. Unlike other loaders which includes random shuffle, I think the LeRobotLoader does not includes shuffling. Maybe we should add it?
| super(HDF5Loader, self).__init__(path) | ||
| self.files = glob.glob(self.path, recursive=True) | ||
|
|
||
| # Handle split parameter similar to VLA loader |
There was a problem hiding this comment.
This is different from the code in mkv branch. For HDF5 and VLA, I assume there is a directory partition for train and test. e.g., ls robodm/vla/nyu_door_opening_surprising_effectiveness/ will get two directories train and test. Similar for HDF5.
There was a problem hiding this comment.
We need to check the versions are the ones we want, probably also update pyproject.toml
Also include a
uv.lockfor easier reproduction.