Zhi Jing1,2, Jinbin Qiao2,3, Ouyang Lu2,4, Jicong Ao2, Shuang Qiu5, Yu-Gang Jiang1,*, Chenjia Bai2,*
1Fudan University†, 2Institute of Artificial Intelligence (TeleAI), China Telecom†,
3Tianjin University, 4Northwestern Polytechnical University, 5City University of Hong Kong
* Equal advising | † Equally leading organizations
- [2026-04-29] 🔓 Open-source the inference code, AssemLM-V1 weights, and demo dataset for inference.
- [2026-04-16] 🗺️ Announce the open-source plan.
- [2026-04-10] 📄 Upload the paper to arXiv: paper
- [2026-03-15] 🎉 Release the first version of the project page.
- [2026-03-05] 🏗️ Create the project page and code repository.
git clone https://github.com/TeleHuman/AssemLM.git
cd AssemLMconda create -n assemlm python=3.10.14 -y
conda activate assemlm
bash setting.shmkdir models && cd models
huggingface-cli download TeleEmbodied/AssemLM-V1 --local-dir ./AssemLM-V1mkdir datasets && cd datasets
huggingface-cli download --repo-type dataset --resume-download TeleEmbodied/AssemLM --local-dir .
cd ..- Run the API server for AssemLM.
bash scripts/run_api.sh- Open another terminal and run the query code:
conda activate assemlm
bash scripts/query_assemlm.shAfter running, two folders will be created in the root directory:
datasets_tmp: contains the input data for the current request.results_tmp: contains the prediction results and visualization outputs.
The first three images are from datasets_tmp, while the last image is from results_tmp.
- 🔓 Release AssemLM-V1 weights, inference code, and a demo dataset.
- 📦 Release the majority of the AssemBench dataset.
- 📚 Release additional datasets and benchmark resources.
- 🧠 Release the training code.
- ⚙️ Release the data processing pipeline.
- 🚀 Release updated and improved model weights.
If you find our work helpful, please cite:
@article{jing2026assemlm,
title={AssemLM: Spatial Reasoning Multimodal Large Language Models for Robotic Assembly},
author={Jing, Zhi and Qiao, Jinbin and Lu, Ouyang and Ao, Jicong and Qiu, Shuang and Jiang, Yu-Gang and Bai, Chenjia},
journal={arXiv preprint arXiv:2604.08983},
year={2026}
}- Our implementation is based on the open-source codebases from StarVLA, TwoByTwo, RoboRefer.
- We also sincerely acknowledge the datasets and assets provided by PartNet, BiAssembly, TwoByTwo, PartNeXt, IKEA-Manual.




