Xihang Yu1, Rajat Talak2, Lorenzo Shaikewitz1, Luca Carlone1
1Massachusetts Institute of Technology 2 National University of Singapore
Picasso presents holistic, physics-constrained reconstruction of contact-rich multi-object scenes. While geometrically accurate reconstructions can still be physically implausible under occlusion and sensor noise, Picasso reasons over the scene jointly instead of reconstructing each object in isolation. We also introduce the Picasso dataset, a benchmark of 10 real-world contact-rich scenes.
[2026-04-27] Picasso is accepted to RSS 2026.
- [2026-02-08] Picasso preprint is available on arXiv.
- [2025-02-08] The Picasso dataset is available on Google Drive.
Create an environment with Python 3.11:
conda create -n picasso python=3.11
conda activate picassoInstall PyTorch, PyTorch3D, and the remaining Python dependencies. The exact PyTorch/PyTorch3D command depends on your CUDA version; for the rest of the stack:
pip install numpy scipy scikit-image opencv-python open3d pyvista trimesh networkx libiglDownload the Picasso dataset using:
wget -O data.zip "https://drive.usercontent.google.com/download?id=1v9l7VZ7OByZit9nhoAYW2VwbsjGEHEwJ&export=download&confirm=yes"
unzip data.zip
rm data.zip
The Picasso dataset contains 10 real-world contact-rich scenes:
The demo expects the dataset root to look like:
Picasso/
├── model/
│ └── <object_name>/
│ └── <object_name>.obj
└── test/
└── <scene_name>/
├── rgb/
│ └── 0.jpg
├── depth/
│ └── 0.npy or 0.png or 0.conf
├── mask/
│ └── 0_<object_id>.png
├── pose/
│ └── 0.json
├── intrinsics.json
|
└── scene_gt.json
Dependency graphs for the released scenes are included in:
dependency_graph_picasso_dataset/<scene_name>/dependency_graph.json
Each graph describes support/dependency edges, for example Table -> holder -> bowl.
Run the demo on the holder_bowl scene:
python demo_picasso.py \
--picasso_dir Picasso \
--scene holder_bowl \
--frame 9 \
--device cuda:0 \
--output_dir ./output
--dump_video
--visualizeUseful arguments:
--scene: scene name underPicasso/test.--frame: frame index or frame key.--dependency_graph_json: override the default dependency graph.--disable_physics: use geometric matching without physics filtering.--scale_min,--scale_max: search object scale range.--num_levels,--num_samples_per_level: scale-search refinement settings.--max_objects: process only the first N non-table graph nodes.--output_dir: result directory, default./output.--dump_video: save an orbiting video of the final predicted scene.--video_length: number of frames in the output video, default100.--visualize: enable the interactive PyVista visualization window.
The demo writes pose results to:
output/<scene>/<frame_key>/results.json
When --dump_video is enabled, it also writes:
output/videos/scene_<scene>_view_<frame_key>/mesh_video.mp4
For example, the holder_bowl demo above should produce a video like this:
If you see a reconstruction preview like this, your installation is working successfully.


