Skip to content

Latest commit

 

History

History
124 lines (88 loc) · 5.05 KB

File metadata and controls

124 lines (88 loc) · 5.05 KB

🏗️ AssemLM: Spatial Reasoning Multimodal Large Language Models for Robotic Assembly

Zhi Jing1,2, Jinbin Qiao2,3, Ouyang Lu2,4, Jicong Ao2, Shuang Qiu5, Yu-Gang Jiang1,*, Chenjia Bai2,*

1Fudan University, 2Institute of Artificial Intelligence (TeleAI), China Telecom,

3Tianjin University, 4Northwestern Polytechnical University, 5City University of Hong Kong

* Equal advising | Equally leading organizations

Paper arXiv Model Datasets Project Page Code

🚀 News

  • [2026-04-29] 🔓 Open-source the inference code, AssemLM-V1 weights, and demo dataset for inference.
  • [2026-04-16] 🗺️ Announce the open-source plan.
  • [2026-04-10] 📄 Upload the paper to arXiv: paper
  • [2026-03-15] 🎉 Release the first version of the project page.
  • [2026-03-05] 🏗️ Create the project page and code repository.

⚙️ Setup Environment

Installation Steps

1. Clone the repository

git clone https://github.com/TeleHuman/AssemLM.git
cd AssemLM

2. Create & Build conda env

conda create -n assemlm python=3.10.14 -y
conda activate assemlm
bash setting.sh

3. Prepare the model

mkdir models && cd models
huggingface-cli download TeleEmbodied/AssemLM-V1 --local-dir ./AssemLM-V1

3. Prepare the dataset

mkdir datasets && cd datasets
huggingface-cli download --repo-type dataset --resume-download TeleEmbodied/AssemLM  --local-dir .
cd ..

🚀 Getting Started

  1. Run the API server for AssemLM.
bash scripts/run_api.sh
  1. Open another terminal and run the query code:
conda activate assemlm
bash scripts/query_assemlm.sh

After running, two folders will be created in the root directory:

  • datasets_tmp: contains the input data for the current request.
  • results_tmp: contains the prediction results and visualization outputs.

The first three images are from datasets_tmp, while the last image is from results_tmp.

🗺️ Open-Source Plan

  • 🔓 Release AssemLM-V1 weights, inference code, and a demo dataset.
  • 📦 Release the majority of the AssemBench dataset.
  • 📚 Release additional datasets and benchmark resources.
  • 🧠 Release the training code.
  • ⚙️ Release the data processing pipeline.
  • 🚀 Release updated and improved model weights.

🔖 Citation

If you find our work helpful, please cite:

@article{jing2026assemlm,
  title={AssemLM: Spatial Reasoning Multimodal Large Language Models for Robotic Assembly},
  author={Jing, Zhi and Qiao, Jinbin and Lu, Ouyang and Ao, Jicong and Qiu, Shuang and Jiang, Yu-Gang and Bai, Chenjia},
  journal={arXiv preprint arXiv:2604.08983},
  year={2026}
}

Acknowledgements