未知环境下无人机集群智能协同探索路径规划

王伟伦; 尤明; 孙磊; 张秀云; 宗群

doi:10.13374/j.issn2095-9389.2023.10.15.002

摘要: 随着无人机执行任务复杂性与环境种类多样性的不断提高，多无人机集群系统逐渐得到国内外的广泛关注，无人机路径规划成为当前研究热点. 考虑到传统路径规划算法一般需要先验地图信息，在搜索救援等环境未知场景中难以满足，本文提出了一种基于强化学习的未知环境下的无人机集群协同探索路径规划方法. 首先，考虑无人机集群协同探索任务特点及动力学、避碰避障等约束条件，基于马尔可夫决策过程，建立无人机集群协同探索博弈模型与评价准则. 其次，提出基于强化学习方法的无人机集群协同探索方法，建立基于策略−评判网络的双网络架构，并利用随机地图增强探索方法面对未知环境的泛化能力. 每架无人机在探索过程中不断收集地图信息，并基于环境信息和个体间的共享信息调整自身策略，通过迭代训练实现未知环境下的集群协同探索. 最后，基于Unity搭建无人机集群协同探索虚拟仿真平台，并与非合作的单智能体算法进行对比试验，验证了本文所提算法在任务成功率、任务完成效率和回合奖励等方面均具有优势.

Abstract: Owing to the increasing complexity of task execution and a wide range of variability in environmental conditions, a single unmanned aerial vehicle (UAV) is insufficient to meet practical mission requirements. Multi-UAV systems have vast potential for applications in areas such as search and rescue. During search and rescue missions, UAVs acquire the location of the target to be rescued and subsequently plan a path that circumvents obstacles and leads to the target. Traditional path-planning algorithms require prior knowledge of obstacle distribution on the map, which may be difficult to obtain in real-world missions. To address the issue of traditional path-planning algorithms that rely on prior map information, this paper proposes a reinforcement learning-based approach for the collaborative exploration of multiple UAVs in unknown environments. First, a Markov decision process is employed to establish a game model and task objectives for the UAV cluster, considering the characteristics of collaborative exploration tasks and various constraints of UAV clusters. To maximize the search and rescue success rate, UAVs must satisfy dynamic and obstacle-avoidance constraints during mission execution. Second, a reinforcement learning-based method for the collaborative exploration of multiple UAVs is proposed. The multiagent soft actor–critic (MASAC) algorithm is used to iteratively train the UAVs’ collaborative exploration strategies. The actor network generates UAV actions, while the critic network evaluates the quality of these strategies. To enhance the algorithm’s generalization capability, training is conducted in randomly generated map environments. To avoid UAVs being obstructed by concave obstacles, a breadth-first search algorithm is used to calculate rewards based on the path distance between the UAVs and targets rather than the linear distance. During the exploration process, each UAV continuously collects and shares the map information with all other UAVs. They make individual action decisions based on the environment and information obtained from other UAVs, and the mission is considered successful if multiple UAVs hover above the target. Finally, a virtual simulation platform for algorithm validation is developed using the Unity game engine. The proposed algorithm is implemented using PyTorch, and bidirectional interaction between the Unity environment and Python algorithm is achieved through the ML-Agents (Machine learning agents) framework. Comparative experiments are conducted on the virtual simulation platform to compare the proposed algorithm with a non-cooperative single-agent SAC algorithm. The proposed method exhibits advantages in terms of task success rate, task completion efficiency, and episode rewards, validating the feasibility and effectiveness of the proposed approach.

未知环境下无人机集群智能协同探索路径规划

Intelligent cooperative exploration path planning for UAV swarm in an unknown environment