丁云龙, 匡敏驰(通讯作者), 朱纪洪, 祝靖宇, 乔直. 基于LSTM-PPO算法的多机空战智能决策及目标分配[J]. 工程科学学报. DOI: 10.13374/j.issn2095-9389.2023.10.13.003
引用本文: 丁云龙, 匡敏驰(通讯作者), 朱纪洪, 祝靖宇, 乔直. 基于LSTM-PPO算法的多机空战智能决策及目标分配[J]. 工程科学学报. DOI: 10.13374/j.issn2095-9389.2023.10.13.003
Intelligent decision making and target assignment of multi-aircraft air combat based on LSTM-PPO algorithm[J]. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2023.10.13.003
Citation: Intelligent decision making and target assignment of multi-aircraft air combat based on LSTM-PPO algorithm[J]. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2023.10.13.003

基于LSTM-PPO算法的多机空战智能决策及目标分配

Intelligent decision making and target assignment of multi-aircraft air combat based on LSTM-PPO algorithm

  • 摘要: 针对传统多机空战中智能决效率低、难以满足复杂空战环境的需求以及目标分配不合理等问题。本文提出一种基于强化学习的多机空战的智能决策及目标分配方法。使用长短期记忆网络(,)对状态进行特征提取和态势感知,将归一化和特征融合后的状态信息训练残差网络和价值网络,智能体通过近端优化策略(,)针对当前态势选择最优动作。以威胁评估指标作为分配依据,计算综合威胁度,优先将威胁值最大的战机作为攻击目标。为了验证算法的有效性,在课题组搭建的数字孪生仿真环境中进行4v4多机空战实验。并在相同的实验环境下与其他强化学习主流算法进行比较。实验结果表明,使用算法在多机空战中的胜率明显优于其它主流强化学习算法,验证了算法的有效性。

     

    Abstract: The efficiency of intelligent decision-making in traditional multi-aircraft air combat is low, making it difficult to meet the demands of complex air combat environments and resulting in unreasonable target allocation. This paper presents a reinforcement learning-based method for intelligent decision-making and target assignment in multi-aircraft air combat. Long and short term memory networks are utilized for feature extraction and situation awareness of the state. The residual network and value network are trained using normalized state information after feature fusion. The agent selects the optimal action based on the current situation through near-end optimization strategy. A comprehensive threat degree is calculated based on the threat assessment index, prioritizing attack targets with higher threat values among warplanes. To validate the algorithm's effectiveness, a 4v4 multi-aircraft air combat experiment was conducted in a digital twin simulation environment developed by our research group. The results were compared with mainstream reinforcement learning algorithms within the same experimental environment, demonstrating that our proposed algorithm achieved significantly better victory rates in multi-aircraft air combat than other mainstream reinforcement learning algorithms, thus confirming its efficacy.

     

/

返回文章
返回