A (LIVE) Comprehensive Review on Leveraging Machine Learning for Multi-Agent Path Finding
A living survey tracking Machine Learning approaches to solving the MAPF Problem.
🕐 Last updated: 2026-06-30
📄 Original Survey (Alkazzi & Okumura - IEEE Access 2024)
📋 Changelog — latest: 2026.06 (+79 papers)
Representation
Representation for Planning - OQ 1
| Paper | Venue | Year | Links |
|---|---|---|---|
| CTRMs: Learning to Construct Cooperative Timed Roadmaps for Multi-Agent Path Planning in Continuous Spaces | AAMAS | 2022 | paper • code • project |
| Avoidance Critical Probabilistic Roadmaps for Motion Planning in Dynamic Environments | ICRA | 2021 | paper |
Environment Optimization - OQ 2
| Paper | Venue | Year | Links |
|---|---|---|---|
| Scaling Multi-Agent Environment Co-Design with Diffusion Models | ICML | 2026 | paper • code |
| Optimization of Edge Directions and Weights for Mixed Guidance Graphs in Lifelong Multi-Agent Path Finding | arXiv | 2026 | paper |
| Differentiable Environment-Trajectory Co-Optimization for Safe Multi-Agent Navigation | arXiv | 2026 | paper |
| Online Guidance Graph Optimization for Lifelong Multi-Agent Path Finding | AAAI | 2025 | paper • code |
| Generative Curricula for Multi-Agent Path Finding via Unsupervised and Reinforcement Learning | JAIR | 2025 | paper • code |
| Co-Optimizing Reconfigurable Environments and Policies for Decentralized Multi-Agent Navigation | TRO | 2025 | paper • project • video |
| Guidance Graph Optimization for Lifelong Multi-Agent Path Finding | IJCAI | 2024 | paper • code • project |
| Learning Neural Traffic Rules | RA-L | 2024 | paper |
| Arbitrarily Scalable Environment Generators via Neural Cellular Automata | NeurIPS | 2023 | paper • code |
| Multi-Robot Coordination and Layout Design for Automated Warehousing | IJCAI | 2023 | paper • code |
| Constrained Environment Optimization for Prioritized Multi-Agent Navigation | IEEE Open Journal of Control Systems | 2023 | paper |
| Environment Optimization for Multi-Agent Navigation | ICRA | 2023 | paper |
Environment Generation for MAPF Algorithm Evaluation
| Paper | Venue | Year | Links |
|---|---|---|---|
| QD-MAPPER: A Quality Diversity Framework to Automatically Evaluate Multi-Agent Path Finding Algorithms in Diverse Maps | AAMAS | 2026 | paper • code • project |
Representation for Selection OQ 3,4
| Paper | Venue | Year | Links |
|---|---|---|---|
| Anytime Automatic Algorithm Selection for the Multi-Agent Path Finding Problem | IEEE Access | 2024 | paper |
| No Panacea in Planning: Algorithm Selection for Suboptimal Multi-Agent Path Finding | arXiv | 2024 | paper |
| Algorithm Selection for Optimal Multi-Agent Path Finding via Graph Embedding | arXiv | 2024 | paper |
| MAPFASTER: A Faster and Simpler take on Multi-Agent Path Finding Algorithm Selection | IROS | 2022 | paper • code |
| MAPFAST: A Deep Algorithm Selector for Multi Agent Path Finding using Shortest Path Embeddings | AAMAS | 2021 | paper • code |
| Algorithm Selection for Optimal Multi-Agent Pathfinding | ICAPS | 2020 | paper • code |
| Automatic algorithm selection in multi-agent pathfinding | arXiv | 2019 | paper |
Planning
Augmenting Existing Solvers - OQ 5,6
Enhancing Conflict-Based Search
| Paper | Venue | Year | Links |
|---|---|---|---|
| Multi-Agent Path Finding Among Dynamic Uncontrollable Agents with Statistical Safety Guarantees | arXiv | 2025 | paper |
| Proactive Conflict Area Prediction for Boosting Search-Based Multi-Agent Pathfinding | IROS | 2025 | paper |
| Conflict Area Prediction for Boosting Search-Based Multi-Agent Pathfinding Algorithms | ICRA | 2024 | paper |
| Accelerating Multi-Agent Planning Using Graph Transformers with Bounded Suboptimality | ICRA | 2023 | paper |
| Learning Node-Selection Strategies in Bounded-Suboptimal Conflict-Based Search for Multi-Agent Path Finding | AAMAS | 2021 | paper |
| Learning to Resolve Conflicts for Multi-Agent Path Finding with Conflict-Based Search | AAAI | 2021 | paper |
Enhancing Prioritized Planning
| Paper | Venue | Year | Links |
|---|---|---|---|
| Learning-guided Prioritized Planning for Lifelong Multi-Agent Path Finding in Warehouse Automation | JAIR | 2026 | paper • code |
| Attention-based Priority Learning for Limited Time Multi-Agent Path Finding | AAMAS | 2024 | paper • code |
| Synthesizing priority planning formulae for multi-agent pathfinding | AIIDE | 2023 | paper |
| Learning a Priority Ordering for Prioritized Planning in Multi-Agent Path Finding | SoCS | 2022 | paper |
Enhancing other MAPF solvers
| Paper | Venue | Year | Links |
|---|---|---|---|
| Graph Attention-Guided Search for Dense Multi-Agent Pathfinding | AAAI | 2026 | paper • code |
| GRAND: Guidance, Rebalancing, and Assignment for Networked Dispatch in Multi-Agent Path Finding | RAL | 2026 | paper |
| Truncated Counterfactual Learning for Anytime Multi-Agent Path Finding | AAAI | 2026 | paper • code • video |
| Discrete Diffusion for Complex and Congested Multi-Agent Path Finding with Sparse Social Attention | arXiv | 2026 | paper |
| LNS2+RL: Combining Multi-Agent Reinforcement Learning with Large Neighborhood Search in Multi-Agent Path Finding | AAAI | 2025 | paper • code |
| Anytime Multi-Agent Path Finding with an Adaptive Delay-Based Heuristic | AAAI | 2025 | paper • code |
| Enhancing PIBT for Multi-Agent Path Finding via MLP-Based Candidate Selection and Priority Perturbation | IEEE Access | 2025 | paper |
| Learn to Refine: Synergistic Multi-Agent Path Optimization for Lifelong Conflict-Free Navigation of Autonomous Vehicles | KDD | 2025 | paper • code |
| Neural Neighborhood Search for Multi-agent Path Finding | ICLR | 2024 | paper • code |
| Adaptive Anytime Multi-Agent Path Finding Using Bandit-Based Large Neighborhood Search | AAAI | 2024 | paper |
| NLNS-MASPF for solving Multi-Agent scheduling and Path-Finding | IROS | 2024 | paper |
| Anytime Multi-Agent Path Finding via Machine Learning-Guided Large Neighborhood Search | AAAI | 2022 | paper |
| Subdimensional Expansion Using Attention-Based Learning For Multi-Agent Path Finding | arXiv | 2021 | paper • code |
Learning-based Policies - OQ 7,8,9,10,11,12
Decentralized
| Paper | Venue | Year | Links |
|---|---|---|---|
| Confidence-Based Curricula for Multi-Agent Path Finding via Reinforcement Learning | JAAMAS | 2026 | paper • code |
| Multi-Agent Reinforcement Learning With Spatial Structure Awareness for Topological Map-Based Path-Finding | RAL | 2026 | paper |
| ORION: Option-Regularized Deep Reinforcement Learning for Cooperative Multi-Agent Online Navigation | RAL | 2026 | paper • code |
| Pairwise is Not Enough: Hypergraph Neural Networks for Multi-Agent Pathfinding | ICLR | 2026 | paper • code • project |
| Learning to Communicate Locally for Large-Scale Multi-Agent Pathfinding | arXiv | 2026 | paper |
| MARL-GPT: Foundation Model for Multi-Agent Reinforcement Learning | AAMAS | 2026 | paper • code • weights • dataset |
| From One to Many: Adaptive Multi-Agent Pathfinding in Heterogeneous Environments | Optical Memory and Neural Networks | 2026 | paper |
| SPARC: Spatial-Aware Path Planning via Attentive Agent Communication | arXiv | 2026 | paper |
| Mean-Field Deep Reinforcement Learning for Multi-Agent Path Finding | RAL | 2026 | paper |
| Spatially Grouped Curriculum Learning for Multi-Agent Path Finding | AAAI | 2026 | paper • code |
| Simulation-Informed Diffusion for Decentralized Multi-robot Motion Planning | arXiv | 2026 | paper |
| Social Behavior as a Key to Learning-based Multi-Agent Pathfinding Dilemmas | AIJ | 2025 | paper • code • project |
| MAPF-GPT: Imitation Learning for Multi-Agent Pathfinding at Scale | AAAI | 2025 | paper • code • project • weights • dataset • notebooks |
| Work Smarter Not Harder: Simple Imitation Learning with CS-PIBT Outperforms Large Scale Imitation Learning for MAPF | ICRA | 2025 | paper • code • project |
| Deploying Ten Thousand Robots: Scalable Imitation Learning for Lifelong Multi-Agent Path Finding | ICRA | 2025 | paper • code • project |
| SRMT: Shared Memory for Multi-agent Lifelong Pathfinding | arXiv | 2025 | paper • code |
| SIGMA: Sheaf-Informed Geometric Multi-Agent Pathfinding | ICRA | 2025 | paper • code |
| Learning Verified Safe Neural Network Controllers for Multi-Agent Path Finding | AAAI | 2025 | paper • video |
| MARF: Cooperative Multi-Agent Path Finding with Reinforcement Learning and Frenet Lattice in Dynamic Environments | ICRA | 2025 | paper |
| Towards Transparent Multi-Agent Autonomous Systems Through Principled Multi-Source Knowledge Distillation | ICRA | 2025 | paper |
| Advancing Learnable Multi-Agent Pathfinding Solvers with Active Fine-Tuning | arXiv | 2025 | paper • code • project |
| MAPF-World: Action World Model for Multi-Agent Path Finding | arXiv | 2025 | paper |
| Towards Information-Optimized Multi-Agent Path Finding: A Hybrid Framework with Reduced Inter-Agent Information Sharing | arXiv | 2025 | paper |
| PC2P: Multi-Agent Path Finding via Personalized-Enhanced Communication and Crowd Perception | IROS | 2025 | paper |
| STF: Spatio-Temporal Fusion-Based Multi-Agent Path-Finding | RAL | 2025 | paper |
| Improving Learnt Local MAPF Policies with Heuristic Search | ICAPS | 2024 | paper • extras |
| Decentralized Monte Carlo Tree Search for Partially Observable Multi-agent Pathfinding | AAAI | 2024 | paper • code |
| Learn to Follow: Decentralized Lifelong Multi-agent Pathfinding via Planning and Learning | AAAI | 2024 | paper • code |
| When to Switch: Planning and Learning for Partially Observable Multi-Agent Pathfinding | IEEE TNNLS | 2024 | paper • code |
| Optimizing Crowd-Aware Multi-Agent Path Finding through Local Communication with Graph Neural Networks | IROS | 2024 | paper • project |
| POAQL: A Partially Observable Altruistic Q-Learning Method for Cooperative Multi-Agent Reinforcement Learning | ICRA | 2024 | paper |
| Crowd Perception Communication-Based Multi-Agent Path Finding With Imitation Learning | RAL | 2024 | paper |
| MFC-EQ: Mean-Field Control with Envelope Q-Learning for Moving Decentralized Agents in Formation | IROS | 2024 | paper • code |
| ALPHA: Attention-based Long-horizon Pathfinding in Highly-structured Areas | ICRA | 2024 | paper • code |
| SCRIMP: Scalable Communication for Reinforcement- and Imitation-Learning-Based Multi-Agent Pathfinding | AAMAS | 2023 | paper • code |
| SACHA: Soft Actor-Critic with Heuristic-Based Attention for Partially Observable Multi-Agent Path Finding | RAL | 2023 | paper • code |
| Learning Selective Communication for Multi-Agent Path Finding | RAL | 2022 | paper • code |
| Multi-agent path finding with prioritized communication learning | ICRA | 2022 | paper • code |
| Distributed Heuristic Multi-Agent Path Finding with Communication | ICRA | 2021 | paper • code |
| Message-Aware Graph Attention Networks for Large-Scale Multi-Robot Path Planning | RAL | 2021 | paper • code |
| PRIMAL2: Pathfinding Via Reinforcement and Imitation Multi-Agent Learning - Lifelong | RAL | 2021 | paper • code |
| Mobile robot path planning in dynamic environments through globally guided reinforcement learning | RAL | 2020 | paper |
| Mapper: Multi-agent path planning with evolutionary reinforcement learning in mixed dynamic environments | IROS | 2020 | paper |
| Graph Neural Networks for Decentralized Multi-Robot Path Planning | IROS | 2019 | paper • code |
| PRIMAL: Pathfinding via Reinforcement and Imitation Multi-Agent Learning | RAL | 2019 | paper • code |
Centralized
| Paper | Venue | Year | Links |
|---|---|---|---|
| Discrete-Guided Diffusion for Scalable and Safe Multi-Robot Motion Planning | AAAI | 2026 | paper |
| Multi-Robot Motion Planning from Vision and Language using Heat-Inspired Diffusion | RAL | 2026 | paper • code • project |
| Train-Small Deploy-Large: Leveraging Diffusion-Based Multi-Robot Planning | arXiv | 2026 | paper |
| DeepFleet: Multi-Agent Foundation Models for Mobile Robots | arXiv | 2025 | paper |
| RAILGUN: A Unified Convolutional Policy for Multi-Agent Path Finding Across Different Environments and Tasks | IROS | 2025 | paper • code |
| Multi-Robot Motion Planning with Diffusion Models | ICLR | 2025 | paper • code • project |
| Simultaneous Multi-Robot Motion Planning with Projected Diffusion Models | ICML | 2025 | paper • code • project |
| Why Solving Multi-agent Path Finding with Large Language Model has not Succeeded Yet | arXiv | 2024 | paper |
| Multi-Agent Path Finding in Continuous Spaces with Projected Diffusion Models | arXiv | 2024 | paper |
Execution
Travel and Action Time Modeling - OQ 13
| Paper | Venue | Year | Links |
|---|---|---|---|
| Conflict Mitigation in Shared Environments using Flow-Aware Multi-Agent Path Finding | ICRA | 2026 | paper |
| From Discrete Plans to Real-World Execution: A World-Model-Driven Framework for Execution-Aware Multi-Agent Path Finding | arXiv | 2025 | paper |
| Traffic Flow Learning Enhanced Large-Scale Multi-Robot Cooperative Path Planning Under Uncertainties | ICRA | 2024 | paper |
| Online Re-Planning and Adaptive Parameter Update for Multi-Agent Path Finding with Stochastic Travel Times | AAMAS | 2023 | paper |
| Congestion Prediction for Large Fleets of Mobile Robots | ICRA | 2023 | paper • project |
| Reinforcement Learning for Zone Based Multiagent Pathfinding under Uncertainty | ICAPS | 2020 | paper |
Failure Prediction - OQ 14
| Paper | Venue | Year | Links |
|---|---|---|---|
| Should I Replan? Learning to Spot the Right Time in Robust MAPF Execution | arXiv | 2026 | paper |
Simulation Environments and Testbeds
| Paper | Venue | Year | Links |
|---|---|---|---|
| CAMAR: Continuous Actions Multi-Agent Routing | AAAI | 2026 | paper • code • poster |
| Advancing MAPF Toward the Real World: A Scalable Multi-Agent Realistic Testbed (SMART) | RA-L | 2026 | paper • code • project |
| POGEMA: A Benchmark Platform for Cooperative Multi-Agent Pathfinding | ICLR | 2025 | paper • code |
| SkyRover: A Modular Simulator for Cross-Domain Pathfinding | IJCAI | 2025 | paper • project |
| 100-Mouse System: Scalable Multi-Robot Testbed with State Management User Interface | Journal of Robotics and Mechatronics | 2025 | paper |
| Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks | NeurIPS | 2023 | paper • MiniGrid • MiniWorld |
| VMAS: A Vectorized Multi-Agent Simulator for Collective Robot Learning | DARS | 2022 | paper • code |
| RWARE | NeurIPS | 2020 | paper • code |
| Flatland-rl: Multi-agent reinforcement learning on trains | arXiv | 2020 | paper • code • project |
Interpretable ML for MAPF
| Paper | Venue | Year | Links |
|---|---|---|---|
| Interpretable Multi-Agent Path Finding via Decision Tree Extraction from Neural Policies | AAAI Workshop | 2026 | paper • video |
Surveys & Benchmarks
| Paper | Venue | Year | Links |
|---|---|---|---|
| Reevaluation of Large Neighborhood Search for MAPF: Findings and Opportunities | SoCS | 2025 | paper • code |
| An empirical evaluation of learning-based multi-agent path finding algorithms in warehouse environments | Robotics and Autonomous Systems | 2025 | paper |
| Where Paths Collide: A Comprehensive Survey of Classic and Learning-Based Multi-Agent Pathfinding | arXiv | 2025 | paper |
Note
In 2024, we published "A Comprehensive Review on Leveraging Machine Learning for Multi-Agent Path Finding - Alkazzi & Okumura" which you can find as open access on the IEEE Access Journal.
Given the nature of research papers, this is now stuck in time. All newer papers and approaches tackling MAPF through Machine Learning techniques are not included, and this pushes the field for someone to eventually re-write such a review to keep the information up to date.
I feel that an incrementally updated review would benefit the community more than a complete new re-write every few years.
(Dream) Ideally, I would love for this to be an actually fully written paper that is being updated monthly with new references or even sections (think a mini booklet style). As an initial step, I am designing it as a list of references under the same structure proposed in the original paper. Once this is updated at a stable rate, I will hopefully move on to the full paper endeavour.
Open Questions
Each section includes open questions worthy of future investigation as originally proposed in our review. For simplicity, we keep them in this table and reference them as OQ X.
| OQ | Question |
|---|---|
| 1 | What can be beneficial criteria and reliable benchmarks for assessing the quality of environment representation? |
| 2 | What are efficient transition mechanisms between offline and online environment optimization? |
| 3 | What is the appropriate input instance representation for algorithm selection? |
| 4 | What is the appropriate representation for MAPF on non-grid worlds? |
| 5 | How can learning from experience be transferred from smaller to larger instances? |
| 6 | How can we identify and extract effective features to maximize the performance of ML-assisted MAPF algorithms? |
| 7 | Which benchmarking suite of environments and evaluation metrics would best reflect the performance of different techniques? |
| 8 | Which communication strategy is most effective in real-world environments with their inherent challenges? |
| 9 | How can effectively learned implicit communication minimize the need and overhead of explicit communication while achieving comparable outcomes? |
| 10 | How could more advanced IL methods improve the performance of agent-based approaches beyond naive behavior cloning (BC)? |
| 11 | How can we avoid reward shaping to eliminate human bias in the learning process? |
| 12 | How can one construct a dataset of MAPF instances that progressively increases in difficulty? |
| 13 | To what extent should the real-world agent dynamics captured by ML be reflected in MAPF? |
| 14 | How can ML enhance the fault tolerance of MAPF systems? |
Citation
@article{Alkazzi2024mlmapf,
author={Alkazzi, Jean-Marc and Okumura, Keisuke},
journal={IEEE Access},
title={A Comprehensive Review on Leveraging Machine Learning for Multi-Agent Path Finding},
year={2024},
doi={10.1109/ACCESS.2024.3392305}
}