梁鸿涛, 王耀南(通讯作者), 华和安, 钟杭, 郑成宏, 曾俊豪, 梁嘉诚, 李政辰. 无人集群系统深度强化学习控制研究进展[J]. 工程科学学报. DOI: 10.13374/j.issn2095-9389.2023.07.30.001
引用本文: 梁鸿涛, 王耀南(通讯作者), 华和安, 钟杭, 郑成宏, 曾俊豪, 梁嘉诚, 李政辰. 无人集群系统深度强化学习控制研究进展[J]. 工程科学学报. DOI: 10.13374/j.issn2095-9389.2023.07.30.001
Research Progress on Deep Reinforcement Learning in Control of Unmanned Swarm System[J]. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2023.07.30.001
Citation: Research Progress on Deep Reinforcement Learning in Control of Unmanned Swarm System[J]. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2023.07.30.001

无人集群系统深度强化学习控制研究进展

Research Progress on Deep Reinforcement Learning in Control of Unmanned Swarm System

  • 摘要: 面对日趋复杂的工作环境和任务要求,无人集群系统越来越需要处理效率更高、泛化能力和适应性更强的控制算法,越来越多的人工智能算法在无人集群领域得到应用。与此同时,深度强化学习进入快速发展的时期,由于融合了深度学习强大的表征能力和强化学习有效的策略搜索能力,深度强化学习已经成为实现通用人工智能颇有前景的学习范式,涌现出许多无人集群与深度强化学习相结合的研究成果。本文从原理、特点等方面介绍深度强化学习概念,分析了深度强化学习多种典型算法,接着讨论无人机集群各式控制需求,并着重研究深度强化学习与无人机集群控制相结合的众多成果,最后针对该领域研究成果的落地转化提出了应用前景和面临的挑战。

     

    Abstract: In the face of complex working environments and task requirements, unmanned swarm systems increasingly need control algorithms with higher processing efficiency, generalization ability and adaptability. More and more artificial intelligence algorithms are applied in the field of unmanned swarm system. At the same time, deep reinforcement learning(DRL) has entered a period of rapid development. Due to the integration of deep learning's powerful representation ability and reinforcement learning's effective strategy search ability, DRL has become a promising learning paradigm for realizing general artificial intelligence. There have been many researches combining unmanned swarm and deep reinforcement learning in recent years. This paper introduces the concept of DRL from the aspects of principle and characteristics. It analyzes a variety of typical algorithms of DRL. Then it discusses the various control requirements of UAV swarm, and focuses on the many achievements of combining deep reinforcement learning and UAV swarm control. Finally it puts forward viewpoints on the application prospects and challenges for the landing and transformation in the combination field. The concept of unmanned swarm originated from the study of the behavior of biological groups. Many species of bees, ants, birds, fish and other creatures have complex group behavior. These clusters are a large number of independent individuals in accordance with certain aggregation rules to form a coordinated, orderly group movement mechanism. The groups behavior shows the characteristics of distribution, coordination, self-organization, environmental adaptability, and they are stable in structure, and can produce intelligence beyond the individuals in them. Similar to biological clusters, in the field of robotics or unmanned aerial vehicles(UAVs), unmanned swarm systems are crowded intelligent systems that are composed of a large number of homogeneous or heterogeneous unmanned equipment to achieve mutual behavior coordination and jointly complete specific tasks, through interactive feedback and incentive response of information. In practical applications, an unmanned swarm system needs to meet the requirements of open environment, changeable situation, limited resources, and real-time response. It requires the system to have namy core collaborative capabilities such as distributed collaborative perception, intelligent collaborative decision-making, and robust collaborative control.The distributed intelligent collaborative control method based on deep reinforcement learning can fully meet the control requirements of high intelligence and robustness of unmanned cluster systems. In addition, by using the powerful representation ability of deep learning technology and the search and optimization ability of reinforcement learning methods, the problem of insufficient data of a single node during online learning can be solved by generative adversarial network, so as to achieve real-time collaborative control and decision-making on a larger scale and high dimension, and complete the design of intelligent decision-making methods in complex and highly dynamic environments.

     

/

返回文章
返回