Skip to content

Any suggestion on reproducing on Libero? #39

@MIKUZ12

Description

@MIKUZ12

We are trying to reproduce DreamZero on LIBERO and would appreciate any guidance on the intended setup. Our current implementation uses LIBERO dataset converted to LeRobot/GEAR format, and we fine-tune from the DreamZero-DROID checkpoint with some heads (state_encoder, action_encoder, action_decoder) are reinitialized and trained fully, while the DiT backbone is tuned with LoRA. One detail we had to handle explicitly is LIBERO’s two-view input: instead of reusing the generic multi-view 2x2 layout, we made a LIBERO-specific preprocessing branch that packs the two views into a left-right image, with the static scene camera on the left and the wrist camera on the right, and we also changed the language prompt wording to match that layout (left view / right view). We also observed that both the converted LIBERO dataset videos and the online simulator observations appear vertically inverted in raw form, so we currently flip both train and eval inputs upright inside the shared transform so that the model sees a consistent orientation in both settings. Dataset-action replay in the simulator looks correct, and training losses decrease substantially, but after 10k steps the policy still does not reliably solve even this single LIBERO task in closed-loop evaluation. Could you clarify whether this two-view handling and image-orientation correction are consistent with your intended LIBERO setup, or whether LIBERO should instead follow a different preprocessing path?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions