A curated, actively maintained list of surveys, papers, datasets, simulators, benchmarks, toolkits, and project pages for embodied AI, robot learning, vision-language-action models, humanoids, and safety.
330+curated resources across10major research and tooling tracks.- Fast entry points for newcomers and practical links for researchers and builders.
- Community-maintained updates through pull requests and issue reports.
- New to the field: begin with Surveys.
- Looking for policy and model work: jump to Brain Models and VLA Models.
- Building systems: use Simulators, Datasets, and Toolkits.
- Evaluating deployment risk: read Safety.
See CONTRIBUTING.md to add a paper, fix a link, or propose a new section. If this repo is useful, please star it and cite it.
Cheng Yin, Chenyu Yang, Zhiwen Hu, Yunxiang Mi, Weichen Lin, Yimeng Wang.
2026-03-30: added a dedicated Safety section with representative papers across perception, cognition, planning, interaction, and agentic systems.2025-11-05: expanded robotic code-as-policy and robotic in-context learning coverage.2025-09-07: refreshed surveys, perception, brain models, VLA models, and embodied RL entries.
- Legend
- Surveys
- Perception
- Brain Models
- VLA Models
- Embodied AI and RL
- Robotic Code as Policy
- Robotic In-Context Learning
- Interaction and Humanoids
- Safety
- Simulators
- Datasets
- Toolkits
- Citation
- Acknowledgements
- public code, dataset, benchmark, simulator, or toolkit is available.
- paper only, project page only, or no maintained public repo was found.
- A few foundational works appear in multiple sections when they clearly span more than one topic.
- Teleoperation of Humanoid Robots: A Survey [Paper Link] [Project Link] [2023]
- Deep Learning Approaches to Grasp Synthesis: A Review [Paper Link] [Project Link] [2023]
- A survey of embodied ai: From simulators to research tasks [Paper Link] [2022]
- A Survey of Embodied Learning for Object-Centric Robotic Manipulation [Paper Link] [Project Link] [2024]
- A Survey on Vision-Language-Action Models for Embodied AI [Paper Link] [2024]
- Embodied Intelligence Toward Future Smart Manufacturing in the Era of AI Foundation Model [Paper Link] [2024]
- Towards Generalist Robot Learning from Internet Video: A Survey [Paper Link] [2024]
- A Survey on Robotics with Foundation Models: toward Embodied AI [Paper Link] [2024]
- Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis [Paper Link] [Project Link] [2024]
- Robot Learning in the Era of Foundation Models: A Survey [Paper Link] [2023]
- Foundation Models in Robotics: Applications, Challenges, and the Future [Paper Link] [Project Link] [2023]
- Large Language Models for Robotics: Opportunities, Challenges, and Perspectives [Paper Link] [2024]
- Awesome-Embodied-Agent-with-LLMs [Project Link] [2024]
- Awesome Embodied Vision [Project Link] [2024]
- Awesome Touch [Project Link] [2024]
- Grasp-Anything Project [Project Link] [2024]
- GraspNet Project [Project Link] [2024]
- Deep Generative Models for Offline Policy Learning: Tutorial, Survey, and Perspectives on Future Directions [Paper Link] [Project Link] [2024]
- Survey of Learning-based Approaches for Robotic In-Hand Manipulation [Paper Link] [2024]
- A Survey of Optimization-based Task and Motion Planning: From Classical To Learning Approaches [Paper Link] [2024]
- Neural Scaling Laws in Robotics [Paper Link] [2025]
- Deep Reinforcement Learning for Robotics: A Survey of Real-World Successes [Paper Link] [2024]
- Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI [Paper Link] [Project Link] [2024]
- Controllable Text Generation for Large Language Models: A Survey [Paper Link] [Project Link] [2024]
- Bridging Language and Action A Survey of Language-Conditioned Robot Manipulation [Paper Link] [2023]
- RGBGrasp: Image-based Object Grasping by Capturing Multiple Views during Robot Arm Movement with Neural Radiance Fields [Paper Link] [Project Link] [2024]
- RGBManip: Monocular Image-based Robotic Manipulation through Active Object Pose Estimation [Paper Link] [Project Link] [2024]
- ManipLLM: Embodied Multimodal Large Language Model for Object-Centric Robotic Manipulation [Paper Link] [Project Link] [2023]
- Play to the Score: Stage-Guided Dynamic Multi-Sensory Fusion for Robotic Manipulation [Paper Link] [Project Link] [2024]
- A Contact Model based on Denoising Diffusion to Learn Variable Impedance Control for Contact-rich Manipulation [Paper Link] [2024]
- FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects [Paper Link] [Project Link] [2024]
- BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects [Paper Link] [Project Link] [2023]
- TacDiffusion: Force-domain Diffusion Policy for Precise Tactile Manipulation [Paper Link] [Project Link] [2024]
- GelFusion: Enhancing Robotic Manipulation under Visual Constraints via Visuotactile Fusion [Paper Link] [Project Link] [2025]
- Antipodal Robotic Grasping using Generative Residual Convolutional Neural Network [Paper Link] [Project Link] [2020]
- Touch begins where vision ends: Generalizable policies for contact-rich manipulation [Paper Link] [Project Link] [2025]
- RACER: Rich Language-Guided Failure Recovery Policies for Imitation Learning [Paper Link] [Project Link] [2024]
- Errors are Useful Prompts: Instruction Guided Task Programming with Verifier-Assisted Iterative Prompting [Paper Link] [Project Link] [2023]
- Generalized Planning in PDDL Domains with Pretrained Large Language Models [Paper Link] [Project Link] [2023]
- QueST: Self-Supervised Skill Abstractions for Learning Continuous Control [Paper Link] [Project Link] [2024]
- Plan Diffuser: Grounding LLM Planners with Diffusion Models for Robotic Manipulation [Paper Link] [2024]
- Action-Free Reasoning for Policy Generalization [Paper Link] [Project Link] [2025]
- Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection [Paper Link] [Project Link] [2024]
- DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment [Paper Link] [Project Link] [2023]
- Chain-of-Thought Predictive Control [Paper Link] [Project Link] [2024]
- CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation [Paper Link] [Project Link] [2024]
- ClevrSkills: Compositional Language and Visual Reasoning in Robotics [Paper Link] [Project Link] [2024]
- RoboMatrix: A Skill-centric Hierarchical Framework for Scalable Robot Task Planning and Execution in Open-World [Paper Link] [Project Link] [2024]
- Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning [Paper Link] [Project Link] [2023]
- Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation [Paper Link] [Project Link] [2024]
- DexCap: Scalable and Portable Mocap Data Collection System for Dexterous Manipulation [Paper Link] [Project Link] [2024]
- HumanPlus: Humanoid Shadowing and Imitation from Humans [Paper Link] [Project Link] [2024]
- On Bringing Robots Home [Paper Link] [Project Link] [2023]
- Universal Manipulation Interface: In-The-Wild Robot Teaching Without In-The-Wild Robots [Paper Link] [Project Link] [2024]
- Diffusion Policy: Visuomotor Policy Learning via Action Diffusion [Paper Link] [Project Link] [2023]
- Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware [Paper Link] [Project Link] [2023]
- Surgical Robot Transformer (SRT): Imitation Learning for Surgical Tasks [Paper Link] [Project Link] [2024]
- Large Language Models for Orchestrating Bimanual Robots [Paper Link] [Project Link] [2024]
- Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [Paper Link] [Project Link] [2022]
- ManipLLM: Embodied MLLM for Object-Centric Robotic Manipulation [Paper Link] [Project Link] [2023]
- 3D Diffusion Policy:Generalizable Visuomotor Policy Learning via Simple 3D Representations [Paper Link] [Project Link] [2024]
- Prediction with Action: Visual Policy Learning via Joint Denoising Process [Paper Link] [Project Link] [2024]
- Real-World Humanoid Locomotion with Reinforcement Learning [Paper Link] [Project Link] [2023]
- Humanoid Locomotion as Next Token Prediction [Paper Link] [2024]
- OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement [Paper Link] [Project Link] [2025]
- LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL [Paper Link] [Project Link] [2025]
- Disentangled World Models: Learning to Transfer Semantic Knowledge from Distracting Videos for Reinforcement Learning [Paper Link] [2025]
- VTAO-BiManip: Masked Visual-Tactile-Action Pre-training with Object Understanding for Bimanual Dexterous Manipulation [Paper Link] [2025]
- Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation [Paper Link] [Project Link] [2024]
- On the Modeling Capabilities of Large Language Models for Sequential Decision Making [Paper Link] [2024]
- RL-VLM-F: Reinforcement Learning from Vision Language Foundation Model Feedback [Paper Link] [Project Link] [2024]
- Embodied Task Planning with Large Language Models [Paper Link] [2023]
- RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks [Paper Link] [2023]
- SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks [Paper Link] [Project Link] [2023]
- An Interactive Agent Foundation Model [Paper Link] [2024]
- Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models [Paper Link] [Project Link] [2023]
- Octopus: Embodied Vision-Language Programmer from Environmental Feedback [Paper Link] [Project Link] [2023]
- RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control [Paper Link] [2023]
- OpenVLA: An Open-Source Vision-Language-Action Model [Paper Link] [Project Link] [2024]
- π0: A Vision-Language-Action Flow Model for General Robot Control [Paper Link] [Project Link] [2024]
- Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks[Paper Link] [Project Link] [2025]
- DoorBot: Closed-Loop Task Planning and Manipulation for Door Opening in the Wild with Haptic Feedback [Paper Link] [2025]
- IMPACT: Intelligent Motion Planning with Acceptable Contact Trajectories via Vision-Language Models [Paper Link] [Project Link] [2025]
- Collision-inclusive Manipulation Planning for Occluded Object Grasping via Compliant Robot Motions [Paper Link] [2025]
- Harnessing the Synergy between Pushing, Grasping, and Throwing to Enhance Object Manipulation in Cluttered Scenarios [Paper Link] [2024]
- GraspGPT: Leveraging Semantic Knowledge from a Large Language Model for Task-Oriented Grasping [Paper Link] [Project Link] [2023]
- RDT-1B: A DIFFUSION FOUNDATION MODEL FOR BIMANUAL MANIPULATION [Paper Link] [Project Link] [2024]
- π0: A Vision-Language-Action Flow Model for General Robot Control [Paper Link] [Project Link] [2024]
- DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes [Paper Link] [Project Link] [2024]
- Yell At Your Robot: Improving On-the-Fly from Language Corrections [Paper Link] [Project Link] [2024]
- Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation [Paper Link] [Project Link] [2022]
- Q-attention: Enabling Efficient Learning for Vision-based Robotic Manipulation [Paper Link] [Project Link] [2022]
- RVT: Robotic View Transformer for 3D Object Manipulation [Paper Link] [Project Link] [2023]
- UP-VLA: A Unified Understanding and Prediction Model for Embodied Agent [Paper Link] [2025]
- Universal Actions for Enhanced Embodied Foundation Models [Paper Link] [Project Link] [2025]
- OpenVLA: An Open-Source Vision-Language-Action Model [Paper Link] [Project Link] [2024]
- AnyPlace: Learning Generalized Object Placement for Robot Manipulation [Paper Link] [Project Link] [2025]
- Robotic Control via Embodied Chain-of-Thought Reasoning [Paper Link] [Project Link] [2024]
- Language-Guided Object-Centric Diffusion Policy for Collision-Aware Robotic Manipulation [Paper Link] [2024]
- Hierarchical Diffusion Policy: manipulation trajectory generation via contact guidance [Paper Link] [Project Link] [2024]
- DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control [Paper Link] [Project Link] [2025]
- RoboGrasp: A Universal Grasping Policy for Robust Robotic Control [Paper Link] [2025]
- Improving Vision-Language-Action Model with Online Reinforcement Learning [Paper Link] [2025]
- RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation [Paper Link] [2025]
- Equivariant Diffusion Policy [Paper Link] [Project Link] [2024]
- FAST: Efficient Action Tokenization for Vision-Language-Action Models [Paper Link] [Project Link] [2025]
- Gemini Robotics: Bringing AI into the Physical World [Paper Link] [2025]
- RT-H: Action Hierarchies Using Language [Paper Link] [Project Link] [2024]
- AgiBot World Colosseo: A Large-scale Manipulation Platform for Scalable and Intelligent Embodied Systems [Paper Link] [Project Link] [2025]
- OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics [Paper Link] [Project Link] [2024]
- VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models[Paper Link] [Project Link] [2023]
- ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation[Paper Link] [Project Link] [2024]
- Helpful DoggyBot: Open-World Object Fetching using Legged Robots and Vision-Language Models[Paper Link] [Project Link] [2024]
- Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning[Paper Link] [Project Link] [2023]
- Octo: An Open-Source Generalist Robot Policy[Paper Link] [Project Link] [2024]
- Vision-Language Foundation Models as Effective Robot Imitators[Paper Link] [Project Link] [2023]
- RT-1: Robotics Transformer for Real-World Control at Scale [Paper Link] [Project Link] [2022]
- PaLM-E: An Embodied Multimodal Language Model [Paper Link] [Project Link] [2023]
- RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control [Paper Link] [Project Link] [2023]
- ALOHA Unleashed: A Simple Recipe for Robot Dexterity [Paper Link] [Project Link] [2024]
- Learning Universal Policies via Text-Guided Video Generation [Paper Link] [2023]
- Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation [Paper Link] [Project Link] [2023]
- GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation [Paper Link] [Project Link] [2024]
- Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers [Paper Link] [Project Link] [2024]
- RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation [Paper Link] [Project Link] [2024]
- VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation [Paper Link] [2025]
- Chain-of-Modality: Learning Manipulation Programs from Multimodal Human Videos with Vision-Language-Models [Paper Link] [2025]
- Aligning Diffusion Behaviors with Q-functions for Efficient Continuous Control [Paper Link] [Project Link] [2024]
- MENTOR: Mixture-of-Experts Network with Task-Oriented Perturbation for Visual Reinforcement Learning [Paper Link] [Project Link] [2024]
- Precise and Dexterous Robotic Manipulation via Human-in-the-Loop Reinforcement Learning [Paper Link] [Project Link] [2024]
- Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation [Paper Link] [Project Link] [2025]
- Adaptive Wiping: Adaptive contact-rich manipulation through few-shot imitation learning with Force-Torque feedback and pre-trained object representations [Paper Link] [2024]
-
CodeDiffuser: Attention-Enhanced Diffusion Policy via VLM-Generated Code for Instruction Ambiguity [Paper Link] [Project Link] [2025]
-
Embodied large language models enable robots to complete complex tasks in unpredictable environments [Paper Link] [Project Link] [2025]
-
Maestro: Orchestrating Robotics Modules with Vision-Language Models for Zero-Shot Generalist Robots [Paper Link] [2025]
-
Code as Policies: Language Model Programs for Embodied Control [Paper Link] [Project Link] [2023]
-
Manipulate-Anything: Automating Real-World Robots using Vision-Language Models [Paper Link] [Project Link] [2024]
- In-Context Learning Enables Robot Action Prediction in LLMs [Paper Link] [Project Link] [2025]
-
Learning to Learn Faster from Human Feedback with Language Model Predictive Control [Paper Link] [Project Link] [2024]
-
ELEGNT: Expressive and Functional Movement Design for Non-anthropomorphic Robot [Paper Link] [Project Link] [2025]
-
Generative Expressive Robot Behaviors using Large Language Models [Paper Link] [Project Link] [2024]
-
A Generative Model to Embed Human Expressivity into Robot Motions [Paper Link] [2024]
-
Exploring the Design Space of Extra-Linguistic Expression for Robots [Paper Link] [2023]
-
Collection of Metaphors for Human-Robot Interaction [Paper Link] [2021]
-
RHINO: Learning Real-Time Humanoid-Human-Object Interaction from Human Demonstrations [Paper Link] [Project Link] [2025]
-
ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills [Paper Link] [Project Link] [2025]
-
ExBody2: Advanced Expressive Humanoid Whole-Body Control [Paper Link] [Project Link] [2024]
-
Expressive Whole-Body Control for Humanoid Robots [Paper Link] [Project Link] [2024]
-
HOVER: Versatile Neural Whole-Body Controller for Humanoid Robots [Paper Link] [Project Link] [2024]
-
OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning [Paper Link] [Project Link] [2024]
-
Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation [Paper Link] [Project Link] [2024]
-
Learning from Massive Human Videos for Universal Humanoid Pose Control [Paper Link] [Project Link] [2024]
-
Mobile-TeleVision: Predictive Motion Priors for Humanoid Whole-Body Control [Paper Link] [Project Link] [2024]
-
HumanPlus: Humanoid Shadowing and Imitation from Humans [Paper Link] [Project Link] [2024]
-
Humanoid-VLA: Towards Universal Humanoid Control with Visual Integration [Paper Link] [2025]
-
XBG: End-to-End Imitation Learning for Autonomous Behaviour in Human-Robot Interaction and Collaboration [Paper Link] [2024]
-
EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning [Paper Link] [Project Link] [2024]
-
HARMON: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions [Paper Link] [Project Link] [2024]
-
ImitationNet: Unsupervised Human-to-Robot Motion Retargeting via Shared Latent Space [Paper Link] [Project Link] [2023]
-
FABG : End-to-end Imitation Learning for Embodied Affective Human-Robot Interaction [Paper Link] [Project Link] [2025]
-
HAPI: A Model for Learning Robot Facial Expressions from Human Preferences [Paper Link] [2025]
-
Human-robot facial coexpression [Paper Link] [2024]
-
Unlocking Human-Like Facial Expressions in Humanoid Robots: A Novel Approach for Action Unit Driven Facial Expression Disentangled Synthesis [Paper Link] [2024]
-
UGotMe: An Embodied System for Affective Human-Robot Interaction [Paper Link] [Project Link] [2024]
-
Knowing Where to Look: A Planning-based Architecture to Automate the Gaze Behavior of Social Robots* [Paper Link] [2022]
-
Naturalistic Head Motion Generation from Speech [Paper Link] [2022]
-
Transitioning to Human Interaction with AI Systems: New Challenges and Opportunities for HCI Professionals to Enable Human-Centered AI [Paper Link] [2023]
-
Roots and Requirements for Collaborative AI [Paper Link] [2023]
-
From Human-Computer Interaction to Human-AI Interaction:New Challenges and Opportunities for Enabling Human-Centered AI [Paper Link] [2021]
-
From explainable to interactive AI: A literature review on current trends in human-AI interaction [Paper Link] [2024]
-
Treat robots as humans? Perspective choice in human-human and human-robot spatial language interaction [Paper Link] [2023]
-
Advances in Large Language Models for Robotics [Paper Link] [2024]
-
Grounding Language to Natural Human-Robot Interaction in Robot Navigation Tasks [Paper Link] [2021]
-
Multi-modal interaction with transformers: bridging robots and human with natural language [Paper Link] [2024]
-
Robot Control Platform for Multimodal Interactions with Humans Based on ChatGPT [Paper Link] [2024]
-
Multi-Grained Multimodal Interaction Network for Sentiment Analysis [Paper Link] [2024]
-
Vision-Language Navigation with Embodied Intelligence: A Survey [Paper Link] [2024]
-
SweepMM: A High-Quality Multimodal Dataset for Sweeping Robots in Home Scenarios for Vision-Language Model [Paper Link] [2024]
-
Recent advancements in multimodal human–robot interaction [Paper Link] [2023]
-
Multi-Modal Data Fusion in Enhancing Human-Machine Interaction for Robotic Applications: A Survey [Paper Link] [2022]
-
LaMI: Large Language Models for Multi-Modal Human-Robot Interaction [Paper Link] [2024]
-
"Help Me Help the AI": Understanding How Explainability Can Support Human-AI Interaction [Paper Link] [2022]
-
Employing Co-Learning to Evaluate the Explainability of Multimodal Sentiment Analysis [Paper Link] [2024]
-
Towards Responsible AI: Developing Explanations to Increase Human-AI Collaboration [Paper Link] [2023]
-
Toward Affective XAI: Facial Affect Analysis for Understanding Explainable Human-AI Interactions [Paper Link] [2021]
As embodied AI systems are deployed in safety-critical environments (autonomous driving, healthcare, household robotics), ensuring their safety becomes technically challenging and socially indispensable. This section highlights representative works on attacks and defenses across five safety layers. We intentionally select ~80 representative papers rather than the full 400+ to avoid overwhelming this repo -- for the complete collection, see Awesome-Embodied-AI-Safety.
- Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses [Paper Link] [Project Link] [2026]
- Safety at Scale: A Comprehensive Survey of Large Model Safety [Paper Link] [Project Link] [2025]
Visual Perception — adversarial attacks and backdoors on visual recognition, detection, and tracking:
- Robust physical-world attacks on deep learning visual classification [Paper Link] [2018]
- Phantom of the ADAS: Securing advanced driver-assistance systems from split-second phantom attacks [Paper Link] [2020]
- BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning [Paper Link] [2022]
- Understanding Zero-Shot Adversarial Robustness for Large-Scale Models [Paper Link] [2023]
- AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models [Paper Link] [2025]
Auditory Perception — voice command injection, audio adversarial examples, and defenses:
- Hidden voice commands [Paper Link] [2016]
- Devil's Whisper: A General Approach for Physical Adversarial Attacks against Commercial Black-box Speech Recognition Devices [Paper Link] [2020]
- SpecPatch: Human-in-the-loop adversarial audio spectrogram patch attack on speech recognition [Paper Link] [2022]
- TrojanModel: A practical trojan attack against automatic speech recognition systems [Paper Link] [2023]
- Antifake: Using adversarial audio to prevent unauthorized speech synthesis [Paper Link] [2023]
Spatial Perception — LiDAR spoofing, point cloud attacks, and 3D perception robustness:
- Physically realizable adversarial examples for lidar object detection [Paper Link] [2020]
- Invisible for both Camera and LiDAR [Paper Link] [2021]
- Exorcising "Wraith": Protecting LiDAR-based Object Detector in Automated Driving System from Appearing Attacks [Paper Link] [2023]
- Adversary is on the Road: Attacks on Visual SLAM with Robust Perturbations on Point Clouds [Paper Link] [2024]
- Towards Real-Time Defense against Object-Based LiDAR Attacks in Autonomous Driving [Paper Link] [2025]
Motion Perception — IMU/GPS/radar sensor spoofing and drone attacks:
- Rocking drones with intentional sound noise on gyroscopic sensors [Paper Link] [2015]
- WALNUT: Waging doubt on the integrity of MEMS accelerometers with acoustic injection attacks [Paper Link] [2017]
- Drift with Devil: Security of Multi-Sensor Fusion based Localization in Autonomous Driving under GPS Spoofing [Paper Link] [2020]
- mmSpoof: Resilient spoofing of automotive millimeter-wave radars using reflect array [Paper Link] [2023]
- Paralyzing Drones via EMI Signal Injection on Sensory Communication Channels [Paper Link] [2023]
Cross-Modal Perception — attacks exploiting multi-sensor fusion inconsistencies:
- Security Analysis of Camera-LiDAR Fusion Against Black-Box Attacks on Autonomous Vehicles [Paper Link] [2022]
- Exploring Adversarial Robustness of LiDAR-Camera Fusion Model in Autonomous Driving [Paper Link] [2023]
- Malicious Attacks against Multi-Sensor Fusion in Autonomous Driving [Paper Link] [2024]
Instruction Understanding — attacks on embodied instruction following and VQA:
- SQA3D: Situated Question Answering in 3D Scenes [Paper Link] [2023]
- Can we trust embodied agents? Exploring backdoor attacks against embodied LLM-based decision-making systems [Paper Link] [2024]
- AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions [Paper Link] [2025]
- RoboSafe: Safeguarding Embodied Agents via Executable Safety Logic [Paper Link] [2025]
World Model — hallucination, robustness, and safety in learned world models:
- SafeDreamer: Safe Reinforcement Learning with World Models [Paper Link] [2024]
- Multi-Object Hallucination in Vision Language Models [Paper Link] [2024]
- Learning Latent Dynamic Robust Representations for World Models [Paper Link] [2024]
Reasoning — jailbreaking chain-of-thought and embodied reasoning:
- Do As I Can, Not As I Say: Grounding Language in Robotic Affordances [Paper Link] [2022]
- Inner Monologue: Embodied Reasoning through Planning with Language Models [Paper Link] [2022]
- Agent Smith: A Single Image Can Jailbreak One Million Multimodal LLM Agents Exponentially Fast [Paper Link] [2024]
- H-CoT: Hijacking the Chain-of-Thought Safety Reasoning Mechanism to Jailbreak Large Reasoning Models [Paper Link] [2025]
Task Planning — jailbreaking LLM planners and backdooring robotic task plans:
- Adversarial Attacks on Optimization based Planners [Paper Link] [2021]
- Jailbreaking LLM-controlled robots [Paper Link] [2024]
- BadRobot: Jailbreaking embodied LLMs in the physical world [Paper Link] [2024]
- HASARD: A Benchmark for Vision-Based Safe Reinforcement Learning in Embodied Agents [Paper Link] [2025]
- Robo-Troj: Backdoor Attacks Against Robotic Manipulation in the Physical World [Paper Link] [2025]
Trajectory Planning — adversarial scenarios for autonomous driving trajectory prediction:
- SafeBench: A Benchmarking Platform for Safety Evaluation of Autonomous Vehicles [Paper Link] [2022]
- On adversarial robustness of trajectory prediction for autonomous vehicles [Paper Link] [2022]
- AdvDo: Realistic adversarial attacks for trajectory prediction [Paper Link] [2022]
- Robust inverse constrained reinforcement learning under model misspecification [Paper Link] [2024]
- AdvDiffuser: Generating adversarial safety-critical driving scenarios via guided diffusion [Paper Link] [2024]
Multi-Agent Planning — Byzantine resilience and adversarial communication in swarms:
- Blockchain Technology Secures Robot Swarms: A Comparison of Consensus Protocols and Their Resilience to Byzantine Robots [Paper Link] [2020]
- The Emergence of Adversarial Communication in Multi-Agent Reinforcement Learning [Paper Link] [2021]
- Robot Swarms Neutralize Harmful Behaviors Through Cross-Referencing [Paper Link] [2023]
- Adversarial Machine Learning Attacks and Defences in Multi-Agent Reinforcement Learning [Paper Link] [2024]
Robot Control — adversarial RL, backdoors in policies, and safe VLA models:
- Robust Adversarial Reinforcement Learning [Paper Link] [2017]
- Adversarial Policies: Attacking Deep Reinforcement Learning [Paper Link] [2020]
- Who Is the Strongest Enemy? Towards Optimal and Efficient Evasion Attacks in Deep RL [Paper Link] [2022]
- Diffusion Policy Attacker: Crafting Adversarial Attacks for Diffusion-based Policies [Paper Link] [2024]
- Embodied laser attack: leveraging scene priors to achieve agent-based robust non-contact attacks [Paper Link] [2024]
- SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Constrained Learning [Paper Link] [2025]
- AttackVLA: Benchmarking Adversarial and Backdoor Attacks on Vision-Language-Action Models [Paper Link] [2025]
Human-Agent Interaction — perceived safety and psychological risks:
- Perceived Safety in Physical Human Robot Interaction -- A Survey [Paper Link] [2021]
- A Taxonomy of Factors Influencing Perceived Safety in Human-Robot Interaction [Paper Link] [2023]
- PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety [Paper Link] [2024]
Multi-Agent Collaboration — inter-agent infection and collusion:
- When Autonomy Goes Rogue: Preparing for Risks of Multi-Agent Collusion in Social Systems [Paper Link] [2025]
Tool Use — prompt injection and skill poisoning in tool-using agents:
- RoboCodeX: Multimodal Code Generation for Robotic Behavior Synthesis [Paper Link] [2024]
- STAC: Stealthy and Targeted Attack on Code Agents [Paper Link] [2025]
- Prompt Injection Attack to Tool Selection in LLM Agents [Paper Link] [2025]
Memory — memory poisoning, privacy leakage, and prompt extraction:
- AgentPoison: Red-teaming LLM agents via poisoning memory or knowledge bases [Paper Link] [2024]
- Ghost of the Past: Identifying and Resolving Privacy Leakage of LLM's Memory Through Proactive User Interaction [Paper Link] [2025]
- Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs [Paper Link] [2025]
- Just Ask: Curious Code Agents Reveal System Prompts in Frontier LLMs [Paper Link] [2026]
Self-Evolving — risks from self-improving and hallucinating agents:
- Agent-SafetyBench: Evaluating the Safety of LLM Agents [Paper Link] [2024]
- Embodied Red Teaming for Auditing Robotic Foundation Models [Paper Link] [2024]
- Your Agent May Misevolve: Emergent Risks in Self-evolving LLM Agents [Paper Link] [2025]
Cascading Risks — cross-layer failures, supply chain attacks, and system-level vulnerabilities:
- Spatiotemporal Attacks for Embodied Agents [Paper Link] [2020]
- Secure Robotics: Nexus of Safety, Trust, and Cybersecurity [Paper Link] [2024]
- SafeAgentBench: A Benchmark for Safe Task Planning of Embodied LLM Agents [Paper Link] [2024]
- Automated Discovery of Semantic Attacks in Multi-Robot Navigation [Paper Link] [2025]
- SkillJect: Automating Stealthy Skill-Based Prompt Injection for Coding Agents [Paper Link] [2026]
- ORBIT: A Unified Simulation Framework for Interactive Robot Learning Environments [Paper Link] [Project Link] [2023]
- Gazebo [Paper Link] [Project Link] [2004]
- Pybullet, a python module for physics simulation for games, robotics and machine learning [Project Link] [2021]
- Mujoco: A physics engine for model-based control [Paper Link] [Project Link] [2012]
- V-REP: A versatile and scalable robot simulation framework [Project Link] [2013]
- AI2-THOR: An Interactive 3D Environment for Visual AI [Paper Link] [Project Link] [2017]
- CLIPORT: What and Where Pathways for Robotic Manipulation [Paper Link] [Project Link] [2021]
- BEHAVIOR-1K: A Human-Centered, Embodied AI Benchmark with 1,000 Everyday Activities and Realistic Simulation [Paper Link] [Project Link] [2024]
- RLBench: The Robot Learning Benchmark & Learning Environment [Paper Link] [Project Link] [2019]
- MimicGen: A Data Generation System for Scalable Robot Learning using Human Demonstrations [Paper Link] [Project Link] [2023]
- CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks [Paper Link] [Project Link] [2022]
- Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning [Paper Link] [Project Link] [2019]
- ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI [Paper Link] [Project Link] [2024]
- HomeRobot: Open-Vocabulary Mobile Manipulation [Paper Link] [Project Link] [2023]
- ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes [Paper Link] [Project Link] [2023]
- Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots [Paper Link] [Project Link] [2023]
- InfiniteWorld: A Unified Scalable Simulation Framework for General Visual-Language Robot Interaction [Paper Link] [Project Link] [2024]
- ProcTHOR: Large-Scale Embodied AI Using Procedural Generation [Paper Link] [Project Link] [2022]
- Holodeck: Language Guided Generation of 3D Embodied AI Environments [Paper Link] [Project Link] [2023]
- PhyScene: Physically Interactable 3D Scene Synthesis for Embodied AI [Paper Link] [Project Link] [2024]
- RoboGen: Towards Unleashing Infinite Data for Automated Robot Learning via Generative Simulation [Paper Link] [Project Link] [2023]
- Genesis: A Universal and Generative Physics Engine for Robotics and Beyond [Project Link] [2025]
- Webots: open-source robot simulator [Paper Link] [Project Link] [2018]
- Unity: A General Platform for Intelligent Agents [Paper Link] [Project Link] [2020]
- ThreeDWorld: A Platform for Interactive Multi-Modal Physical Simulation [Paper Link] [Project Link] [2021]
- iGibson 1.0: A Simulation Environment for Interactive Tasks in Large Realistic Scenes [Paper Link] [Project Link] [2021]
- SAPIEN: A SimulAted Part-based Interactive ENvironment [Paper Link] [Project Link] [2020]
- VirtualHome: Simulating Household Activities via Programs [Paper Link] [Project Link] [2018]
- Modular Open Robots Simulation Engine: MORSE [Paper Link] [Project Link] [2011]
- VRKitchen: an Interactive 3D Virtual Environment for Task-oriented Learning [Paper Link] [Project Link] [2019]
- CHALET: Cornell House Agent Learning Environment [Paper Link] [Project Link] [2018]
- Habitat: A Platform for Embodied AI Research [Paper Link] [Project Link] [2019]
- MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge [Paper Link] [Project Link] [2022]
- ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks [Paper Link] [Project Link] [2019]
- BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning [Paper Link] [Project Link] [2019]
- Gibson Env: Real-World Perception for Embodied Agents [Paper Link] [Project Link] [2018]
- iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks [Paper Link] [Project Link] [2021]
- RoboTHOR: An Open Simulation-to-Real Embodied AI Platform [Paper Link] [Project Link] [2020]
- LIBERO: Benchmarking Knowledge Transfer for Lifelong Robot Learning [Paper Link] [Project Link] [2023]
- robosuite: A Modular Simulation Framework and Benchmark for Robot Learning [Paper Link] [Project Link] [2020]
- Demonstrating HumanTHOR: A Simulation Platform and Benchmark for Human-Robot Collaboration in a Shared Workspace [Paper Link] [Project Link] [2024]
- Robomimic: What Matters in Learning from Offline Human Demonstrations for Robot Manipulation [Paper Link] [Project Link] [2021]
- Adroit: Manipulators and Manipulation in high dimensional spaces [Paper Link] [Project Link] [2016]
- Gymnasium-Robotics [Paper Link] [Project Link] [2024]
- RoboHive: A Unified Framework for Robot Learning [Paper Link] [Project Link] [2024]
- Efficient Grasping from RGBD Images: Learning using a new Rectangle Representation [Paper Link] [2011]
- Real-World Multiobject, Multigrasp Detection [Paper Link] [Project Link] [2018]
- Jacquard: A Large Scale Dataset for Robotic Grasp Detection [Paper Link] [Project Link] [2018]
- Learning 6-DOF Grasping Interaction via Deep Geometry-aware 3D Representations [Paper Link] [Project Link] [2018]
- ACRONYM: A Large-Scale Grasp Dataset Based on Simulation [Paper Link] [Project Link] [2020]
- EGAD! an Evolved Grasping Analysis Dataset for diversity and reproducibility in robotic manipulation [Paper Link] [Project Link] [2020]
- GraspNet-1Billion: A Large-Scale Benchmark for General Object Grasping [Paper Link] [Project Link] [2020]
- Grasp-Anything: Large-scale Grasp Dataset from Foundation Models [Paper Link] [Project Link] [2023]
- DexGraspNet 2.0: Learning Generative Dexterous Grasping in Large-scale Synthetic Cluttered Scenes [Paper Link] [Project Link] [2024]
- Yale-CMU-Berkeley dataset for robotic manipulation research [Paper Link] [Project Link] [2017]
- AKB-48: A Real-World Articulated Object Knowledge Base [Paper Link] [Project Link] [2022]
- GAPartNet: Cross-Category Domain-Generalizable Object Perceptionmand Manipulation via Generalizable and Actionable Parts [Paper Link] [Project Link] [2022]
- Bi-DexHands: Towards Human-Level Bimanual Dexterous Manipulation [Paper Link] [Project Link] [2022]
- DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects [Paper Link] [Project Link] [2023]
- PartManip: Learning Cross-Category Generalizable Part Manipulation Policy from Point Cloud Observations [Paper Link] [Project Link] [2023]
- Open X-Embodiment: Robotic Learning Datasets and RT-X Models [Paper Link] [Project Link] [2024]
- RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents [Paper Link] [Project Link] [2025]
- ALOHA 2: An Enhanced Low-Cost Hardware for Bimanual Teleoperation [Paper Link] [Project Link] [2024]
- GRUtopia: Dream General Robots in a City at Scale [Paper Link] [Project Link] [2024]
- All Robots in One: A New Standard and Unified Dataset for Versatile, General-Purpose Embodied Agents [Paper Link] [Project Link] [2024]
- VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks [Paper Link] [Project Link] [2024]
- RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation [Paper Link] [Project Link] [2024]
- On Bringing Robots Home [Paper Link] [Project Link] [2023]
- Empowering Embodied Manipulation: A Bimanual-Mobile Robot Manipulation Dataset for Household Tasks [Paper Link] [Project Link] [2024]
- DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset [Paper Link] [Project Link] [2024]
- BridgeData V2: A Dataset for Robot Learning at Scale [Paper Link] [Project Link] [2024]
- RoboAgent: Generalization and Efficiency in Robot Manipulation via Semantic Augmentations and Action Chunking [Paper Link] [Project Link] [2023]
- AgiBot World Colosseum [Paper Link] [Project Link] [2024]
- REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction [Paper Link] [Project Link] [2023]
- OakInk2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion [Paper Link] [Project Link] [2024]
- A dataset of relighted 3d interacting hands [Paper Link] [Project Link] [2023]
- Human-agent joint learning for efficient robot manipulation skill acquisition [Paper Link] [Project Link] [2025]
- RoboNet: Large-Scale Multi-Robot Learning [Paper Link] [Project Link] [2020]
- MT-Opt: Continuous Multi-Task Robotic Reinforcement Learning at Scale [Paper Link] [Project Link] [2021]
- BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning [Paper Link] [Project Link] [2022]
- VIMA: General Robot Manipulation with Multimodal Prompts [Paper Link] [Project Link] [2023]
- FastUMI: A Scalable and Hardware-Independent Universal Manipulation Interface with Dataset [Paper Link] [Project Link] [2024]
- PyRep: Bringing V-REP to Deep Robot Learning [Paper Link] [Project Link] [2024]
- Yet Another Robotics and Reinforcement learning framework for PyTorch [Project Link] [2024]
If this repo helps your work, please use the metadata in CITATION.cff or cite it as:
@misc{yin2025awesomeembodiedai,
title = {Awesome-Embodied-AI},
author = {Cheng Yin and Chenyu Yang and Zhiwen Hu and Yunxiang Mi and Weichen Lin and Yimeng Wang},
year = {2025},
howpublished = {\url{https://github.com/wadeKeith/Awesome-Embodied-AI}},
note = {Curated repository of embodied AI resources}
}This repo builds on and cross-links with several strong community collections:
