Доклады Российской академии наук. Математика, информатика, процессы управления, 2022, T. 508, № 1, стр. 88-93

МЕТОДЫ ПЛАНИРОВАНИЯ И ОБУЧЕНИЯ В ЗАДАЧАХ МНОГОАГЕНТНОЙ НАВИГАЦИИ

К. С. Яковлев 12*, А. А. Андрейчук 1, А. А. Скрынник 1, А. И. Панов 12

1 Институт искусственного интеллекта AIRI
Москва, Россия

2 Федеральный исследовательский центр “Информатика и управление” Российской академии наук
Москва, Россия

* E-mail: Yakovlev@airi.net

Поступила в редакцию 28.10.2022
После доработки 31.10.2022
Принята к публикации 03.11.2022

Аннотация

Задача многоагентной навигации возникает, с одной стороны, во множестве прикладных областей. Классический пример – автоматизированные склады, на которых одновременно функционирует большое число мобильных роботов-сортировщиков товаров. С другой стороны, эта задача характеризуется отсутствием универсальных методов решения, удовлетворяющих одновременно многим (зачастую – противоречивым) требованиям. Примером таких критериев могут служить гарантия отыскания оптимальных решений, высокое быстродействие, возможность работы в частично-наблюдаемым средах и т.д. В настоящей работе приведен обзор современных методов решения задачи многоагентной навигации. Особое внимание уделяется различным постановкам задачи. Рассматриваются различия и вопросы применимости обучаемых и необучаемых методов решения. Отдельно приводится анализ экспериментальных программных сред, необходимых для реализации обучаемых подходов.

Ключевые слова: планирование пути, эвристический поиск, обучение с подкреплением, многоагентные системы

Список литературы

  1. Ma H., Koenig S. AI buzzwords explained: Multi-agent path finding (MAPF). AI Matters. 2017. V. 3 (3). P. 15–19.

  2. Morris R., Pasareanu C.S., Luckow K., Malik W., Ma H., Kumar T.S., Koenig S. Planning, scheduling and monitoring for airport surface operations. Workshops at the 30th AAAI Conference on Artificial Intelligence, 2016.

  3. Yap P. Grid-based path-finding. Conference of the canadian society for computational studies of intelligence, Springer, Berlin, Heidelberg, 2002. P. 44–55.

  4. Ma H., Koenig S. Optimal Target Assignment and Path Finding for Teams of Agents. Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems, 2016. P. 1144–1152.

  5. Liu M., Ma H., Li J., Koenig S. Task and Path Planning for Multi-Agent Pickup and Delivery. Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems, 2019. P. 1152–1160.

  6. Li J., Tinka A., Kiesel S., Durham J.W., Kumar T.S., Koenig S. Lifelong multi-agent path finding in large-scale warehouses. Proceedings of the 30th AAAI Conference on Artificial Intelligence, 2021. P. 11272–11281.

  7. Hart P.E., Nilsson N.J., Raphael B. A formal basis for the heuristic determination of minimum cost paths. IEEE transactions on Systems Science and Cybernetics, 1968. V. 4(2). P. 100–107.

  8. Finding optimal solutions to cooperative pathfinding problems. Proceedings of the 24th AAAI Conference on Artificial Intelligence, 2010. P. 173–178.

  9. Sharon G., Stern R., Felner A., Sturtevant N.R. Conflict-based search for optimal multi-agent pathfinding. Artificial Intelligence, 2015, 219. P. 40–66.

  10. Wagner G., Choset H. M*: A complete multirobot path planning algorithm with performance bounds. Proceedings of the 2011 IEEE/RSJ international conference on intelligent robots and systems, 2011. P. 3260–3267.

  11. Surynek P., Felner A., Stern R., Boyarski E. Efficient SAT approach to multi-agent path finding under the sum of costs objective. Proceedings of the 22nd European conference on artificial intelligence, 2016. P. 810–818.

  12. Yu J., LaValle S.M. Multi-agent path planning and network flow. Algorithmic foundations of robotics X, 2013. P. 157–173.

  13. Sutton R.S., Barto A.G. Reinforcement learning: An introduction. 2nd edn. Bradford Books, 2018.

  14. Kornhauser D., Miller G., Spirakis P. Coordinating Pebble Motion on Graphs, The Diameter of Permutation Groups, And Applications. The 25th Annual Symposium on Foundations of Computer Science, 1984. P. 241–250.

  15. Ratner D., Warmuth M. The (n2−1)-puzzle and related relocation problems. Journal of Symbolic Computation, 1990. V. 10 (2). P. 111–137.

  16. Nebel B. On the computational complexity of multi-agent pathfinding on directed graphs. Proceedings of the 20th International Conference on Automated Planning and Scheduling, 2020. P. 212–216.

  17. Yu J., LaValle S.M. Optimal multirobot path planning on graphs: Complete algorithms and effective heuristics. IEEE Transactions on Robotics, 2016. V. 32 (5). P. 1163–1177.

  18. De Wilde B., Ter Mors A.W., Witteveen C. Push and rotate: a complete multi-agent pathfinding algorithm. Journal of Artificial Intelligence Research, 2014. V. 51. P. 443–492.

  19. Boyarski E., Felner A., Stern R., Sharon G., Tolpin D., Betzalel O., Shimony E. ICBS: improved conflict-based search algorithm for multi-agent pathfinding. Proceedings of the 24th International Conference on Artificial Intelligence, 2015. P. 740–746.

  20. Li J., Harabor D., Stuckey P. J., Ma H., Koenig S. Symmetry-breaking constraints for grid-based multi-agent path finding. Proceedings of the 33rd AAAI Conference on Artificial Intelligence, 2019. P. 6087–6095.

  21. Barer M., Sharon G., Stern R., Felner A. Suboptimal variants of the conflict-based search algorithm for the multi-agent pathfinding problem. Proceedings of the 7th Annual Symposium on Combinatorial Search, 2014.

  22. Andreychuk A., Yakovlev K., Surynek P., Atzmon D., Stern R. Multi-agent pathfinding with continuous time. Artificial Intelligence, 2022. V. 305. P. 103662.

  23. Erdmann M., Lozano-Perez T. On multiple moving objects. Algorithmica, 1987. V. 2 (1). P. 477–521.

  24. Cap M., Vokrinek J., Kleiner A. Complete decentralized method for on-line multi-robot trajectory planning in well-formed infrastructures. Proceedings of the 25th International conference on automated planning and scheduling, 2015. P. 324–332.

  25. Yakovlev K., Andreychuk A. Any-Angle Pathfinding for Multiple Agents Based on SIPP Algorithm. Proceedings of the 17th International Conference on Automated Planning and Scheduling, 2017. P. 586–594.

  26. Kaduri O., Boyarski E. Stern R. Algorithm selection for optimal multi-agent pathfinding. Proceedings of the International Conference on Automated Planning and Scheduling, 2020, June, 30. P. 161–165.

  27. Ren J., Sathiyanarayanan V., Ewing E., Senbaslar B., Ayanian N. MAPFAST: A Deep Algorithm Selector for Multi Agent Path Finding using Shortest Path Embeddings. Proceedings of the 20th International Conference on Autonomous Agents and MultiAgent Systems, 2021, May. P. 1055–1063.

  28. Li J., Chen Z., Harabor D., Stuckey P.J. Koenig S. MAPF-LNS2: Fast Repairing for Multi-Agent Path Finding via Large Neighborhood Search. Proceedings of the AAAI Conference on Artificial Intelligence, 2022.

  29. Huang T., Koenig S., Dilkina B. Learning to resolve conflicts for multi-agent path finding with conflict-based search. Proceedings of the AAAI Conference on Artificial Intelligence, 2021, May, V. 35, № 13. P. 11246–11253.

  30. Sartoretti G., Kerr J., Shi Y., Wagner G., Kumar T.S., Koenig S., Choset H. Primal: Pathfinding via reinforcement and imitation multi-agent learning. IEEE Robotics and Automation Letters, 2019. V. 4 (3). P. 2378–2385.

  31. Damani M., Luo Z., Wenzel E., Sartoretti G. PRIMAL $ _2 $: Pathfinding via reinforcement and imitation multi-agent learning-lifelong. IEEE Robotics and Automation Letters, 2021. V. 6 (2). P. 2666–2673.

  32. Ferner C., Wagner G., Choset H. ODrM* optimal multirobot path planning in low dimensional search spaces. 2013 IEEE International Conference on Robotics and Automation, 2013, May. P. 3854–3859.

  33. Riviere B., Hönig W., Yue Y., Chung S.J. Glas: Global-to-local safe autonomy synthesis for multi-robot motion planning with end-to-end learning. IEEE Robotics and Automation Letters, 2020. V. 5 (3). P. 4249–4256.

  34. Liu Z., Chen B., Zhou H., Koushik G., Hebert M., Zhao D. Mapper: Multi-agent path planning with evolutionary reinforcement learning in mixed dynamic environments. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2020, October. P. 11748–11754.

  35. Wang B., Liu Z., Li Q., Prorok A. Mobile robot path planning in dynamic environments through globally guided reinforcement learning. IEEE Robotics and Automation Letters, 2020. V. 5 (4). P. 6932–6939.

  36. Schulman J., Wolski F., Dhariwal P., Radford A., Klimov O. Proximal policy optimization algorithms, 2017, arXiv preprint arXiv:1707.06347.

  37. Skrynnik A., Andreychuk A., Yakovlev K., Panov A. Pathfinding in stochastic environments: learning vs planning. PeerJ Computer Science, 2022. V. 8. P. e1056.

  38. Berner C., Brockman G., Chan B., Cheung V., Dębiak P., Dennison C., Farhi D., Fischer Q., Hashme S., Hesse C, Józefowicz R. Dota 2 with large scale deep reinforcement learning, 2019, arXiv preprint arXiv:1912.06680.

  39. Rashid T., Samvelyan M., Schroeder C., Farquhar G., Foerster J., Whiteson S. Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning. International conference on machine learning, 2018, July. P. 4295–4304. PMLR.

  40. Lowe R., Wu Y.I., Tamar A., Harb J., Pieter Abbeel O., Mordatch I. Multi-agent actor-critic for mixed cooperative-competitive environments. Advances in neural information processing systems, 2017, 30.

  41. Yu C., Velu A., Vinitsky E., Wang Y., Bayen A., Wu Y. The surprising effectiveness of ppo in cooperative, multi-agent games, 2021, arXiv preprint arXiv:2103.01955.

  42. Peng B., Rashid T., Schroeder de Witt C., Kamienny P.A., Torr P., Böhmer W., Whiteson S. Facmac: Factored multi-agent centralised policy gradients. Advances in Neural Information Processing Systems, 2021. V. 34. P. 12208–12221.

  43. Samvelyan M., Rashid T., De Witt C.S., Farquhar G., Nardelli N., Rudner T.G., Hung C.M., Torr P.H., Foerster J., Whiteson S. The starcraft multi-agent challenge, 2019, arXiv preprint arXiv:1902.04043.

  44. Moerland T.M., Broekens J., Jonker C.M. Model-based Reinforcement Learning: A Survey,” pp. 421–429, Jun. 2020, [Online]. Available: http://arxiv.org/abs/2006.16712.

  45. Skrynnik A., Yakovleva Y., Davydov D., Yakovlev K., Panov A.I. Hybrid Policy Learning for Multi-Agent Pathfinding, IEEE Access, 2021. V. 9. P. 126034–126047, https://doi.org/10.1109/ACCESS.2021.3111321

  46. Suarez J., Du Y., Isola P., Mordatch I. Neural MMO: A massively multiagent game environment for training and evaluating intelligent agents, 2019, arXiv preprint arXiv:1903.00784.

  47. Terry J., Black B., Grammel N., Jayakumar M., Hari A., Sulliva, R., Santos L.S., Dieffendahl C., Horsch C., Perez-Vicente R., Williams N. Pettingzoo: Gym for multi-agent reinforcement learning. Advances in Neural Information Processing Systems, 2021. V. 34. P. 15032–15043.

  48. Laurent F., Schneider M., Scheller C., Watson J., Li J., Chen Z., Zheng Y., Chan S.H., Makhnev K., Svidchenko O., Egorov V. Flatland competition 2020: MAPF and MARL for efficient train coordination on a grid world. NeurIPS 2020 Competition and Demonstration Track, 2021, August. P. 275–301. PMLR.

  49. Li J., Chen Z., Zheng Y., Chan S.H., Harabor D., Stu-ckey P.J., Ma H., Koenig S. Scalable rail planning and replanning: Winning the 2020 flatland challenge. Proceedings of the International Conference on Automated Planning and Scheduling, 2021, May. V. 31. P. 477–485.

  50. Skrynnik A., Andreychuk A., Yakovlev K., Panov A.I. POGEMA: Partially Observable Grid Environment for Multiple Agents, 2022, arXiv preprint arXiv:2206.10944.

Дополнительные материалы отсутствуют.

Инструменты

Доклады Российской академии наук. Математика, информатика, процессы управления