Any suggestion on reproducing on Libero?

We are trying to reproduce DreamZero on LIBERO and would appreciate any guidance on the intended setup. Our current implementation uses LIBERO dataset converted to LeRobot/GEAR format, and we fine-tune from the DreamZero-DROID checkpoint with some heads (`state_encoder`, `action_encoder`, `action_decoder`) are reinitialized and trained fully, while the DiT backbone is tuned with LoRA. One detail we had to handle explicitly is LIBERO’s two-view input: instead of reusing the generic multi-view 2x2 layout, we made a LIBERO-specific preprocessing branch that packs the two views into a left-right image, with the static scene camera on the left and the wrist camera on the right, and we also changed the language prompt wording to match that layout (`left view` / `right view`). We also observed that both the converted LIBERO dataset videos and the online simulator observations appear vertically inverted in raw form, so we currently flip both train and eval inputs upright inside the shared transform so that the model sees a consistent orientation in both settings. Dataset-action replay in the simulator looks correct, and training losses decrease substantially, but after 10k steps the policy still does not reliably solve even this single LIBERO task in closed-loop evaluation. Could you clarify whether this two-view handling and image-orientation correction are consistent with your intended LIBERO setup, or whether LIBERO should instead follow a different preprocessing path?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any suggestion on reproducing on Libero? #39

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Any suggestion on reproducing on Libero? #39

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions