苏牧青, 王寅(通讯作者), 濮锐敏, 余萌. 群体智能专辑+基于强化学习的多无人车协同围捕方法研究[J]. 工程科学学报. DOI: 10.13374/j.issn2095-9389.2023.09.15.004
引用本文: 苏牧青, 王寅(通讯作者), 濮锐敏, 余萌. 群体智能专辑+基于强化学习的多无人车协同围捕方法研究[J]. 工程科学学报. DOI: 10.13374/j.issn2095-9389.2023.09.15.004
Research on the Collaborative Siege Method of Multiple Unmanned Vehicles Based on Reinforcement Learning[J]. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2023.09.15.004
Citation: Research on the Collaborative Siege Method of Multiple Unmanned Vehicles Based on Reinforcement Learning[J]. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2023.09.15.004

群体智能专辑+基于强化学习的多无人车协同围捕方法研究

Research on the Collaborative Siege Method of Multiple Unmanned Vehicles Based on Reinforcement Learning

  • 摘要: 本文面向无人车协同目标围捕控制问题开展研究,提出了一种基于柔性演员-评论家(Soft Actor Critic, SAC)算法框架的协同围捕算法,针对协同围捕场景状态空间维度大,学习类算法训练效率低的问题,提出引入注意力机制来约束状态空间维度并解决Critic网络过估计的问题。为解决协同围捕任务中奖励稀疏的问题,提出通过解耦奖励函数将奖励函数分为个体奖励和协同奖励,增加了奖励的频率,进而进一步提高算法的收敛速度。仿真和实验表明,本文所提出的方法具有更快的收敛速度,在围捕成功率等指标上优于SAC算法。

     

    Abstract: In response to the challenge of unmanned vehicles capturing land escape target, a multi-agent reinforcement learning-based capture algorithm is proposed. First, the environment and motion models are established within the context of cooperative capture by unmanned vehicles, and the criteria for direct capture success are defined to meet the requirements of safety and coordination. Then, the Soft Actor Critic (SAC) algorithm is subsequently adopted as the training framework. However, as the number of agents increases, the multi-agent environment becomes more complex and unstable, which may lead to problems such as dimension explosion or convergence failures. SAC encounters difficulties when handling high-dimensional state spaces. The traditional SAC’s Critic network treats all state features equally during the processing of state information, which may lead to overestimation. In this study, an attention mechanism is introduced into the SAC’s Critic network to focus on the most crucial state features for the task and selectively process various state features. This enables capturing agents to concentrate on the behavior and location of the target agent, enhancing coordination and cooperation during pursuit. This focus on the target agent maximizes capture effectiveness and ensures accurate estimation of the true value function. Unnecessary activities and wasteful situations are minimized, thus improving efficiency and robustness. The attention weights adapt dynamically to environmental changes, enhancing their adaptability to unmanned vehicle behavior and state alterations. Furthermore, designing an appropriate reward function is crucial in reinforcement learning, directly impacting the performance and effectiveness of unmanned vehicles in the learning process. To tackle the challenge of sparse rewards in multi-vehicle capture scenarios, the paper employs a strategy of decoupling the reward function. This entails segmenting the reward function into individual rewards and cooperative rewards, aiming to maximize both global and localized incentives. The secondary objective is to minimize the target unmanned vehicle's action space through cooperative cooperation among capturing unmanned vehicles, ultimately achieving the target unmanned vehicle's capture. Compared with existing SAC algorithms, the proposed method significantly improves the capture success rate. Finally, through simulation experiments comparing it with other learning methods, the study verifies the effectiveness and superiority of the proposed algorithm and reward function design methodology.

     

/

返回文章
返回