基于深度强化学习的无人机集群数字孪生编队避障

张宇宸; 段海滨; 魏晨

doi:10.13374/j.issn2095-9389.2023.09.28.005

基于深度强化学习的无人机集群数字孪生编队避障

Digital twin-based obstacle avoidance method for unmanned aerial vehicle formation control using deep reinforcement learning

摘要

摘要: 基于多智能体深度强化学习，通过使用局部信息建立单个无人机的状态空间，并使用多智能体近端策略优化（Multi-agent proximal policy optimization，MAPPO）的在线策略算法来训练策略网络，从而克服了环境的不确定性和对全局信息的依赖. 同时，引入了数字孪生的概念，为资源紧张型算法提供了新思路. 为了解决采样困难和资源紧张的问题，基于数字孪生技术，构建了一个用于无人机编队避障策略模型训练的架构. 首先，构建了多个数字孪生环境，用于强化学习算法在任务开始之前进行交互采样的预训练，以使集群具备基本的任务能力. 然后，使用在真实环境中采集的数据进行补充训练，使得集群能够更好地完成任务. 对采用这种两阶段训练架构的效果进行了对比，同时与其他策略算法进行比较，验证了MAPPO的样本效率性能. 最后，设计了实际飞行验证测试，验证了从孪生环境中获得的策略模型的实用性和可靠性.

Abstract: Unmanned aerial vehicle (UAV) swarms have found extensive applications in various fields, playing a crucial role in cluster collaboration. These swarms involve multiple UAVs that work together to achieve common objectives. A key challenging task in swarm operations is collision-free formation control of UAVs. To solve this problem, applying deep reinforcement learning methods has received significant attention, but their application on autonomous UAVs poses challenges, including dependency on global information during training, difficulties in sampling, and excessive resource utilization. To overcome these challenges, in this work, a novel approach based on multi-agent deep reinforcement learning (MARL) is proposed for collision-free formation control of UAV swarms. MARL allows each UAV to interact with a dynamic environment that includes other UAVs, enabling collaborative decision-making and adaptive behavior. We focus on leveraging local information to establish a state space for individual UAVs. To train the policy network, we employ the multi-agent proximal policy optimization (MAPPO) algorithm, allowing robust learning and policy optimization in a multi-agent setting. Also, we address the issues of sampling difficulties and resource constraints by utilizing digital twin technology, serving as a bridge between physical entities and virtual models, which offers a novel approach to the intelligent collaborative control of drone swarms. By establishing models in virtual space, digital twin technology enables the simulation of real-world spaces for pre-training the reinforcement learning algorithm by generating synthetic experiences. We construct multiple digital twin environments to facilitate interactive sampling and pre-train the swarm with basic task capabilities. Then, we supplement the training using real-world data collected in actual environments, enhancing the ability of the swarm to perform optimally in real-world scenarios. To evaluate the effectiveness of our approach, we compare the performance of the two-stage training architecture with other policy algorithms. To validate the sample efficiency of the on-policy algorithm MAPPO, we conducted a comparative analysis with other policy algorithms, particularly off-policy algorithms. The results reveal the superior sample efficiency and stability of MAPPO in addressing the challenges of collision-free formation control. Finally, we conduct a real-flight validation test to validate the practicality and reliability of the strategy model derived from the digital twin environments. Overall, this work demonstrates the effectiveness of our proposed approach in enabling UAV swarms to navigate complex environments and achieve collision-free formation control.

HTML全文

参考文献(25)

施引文献

资源附件(0)