OU Yang, GUO Zhengyu, LUO Delin, MIAO Kehua. Collaborative air combat maneuvering decision-making method based on graph convolutional deep reinforcement learning[J]. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2023.09.25.004
Citation: OU Yang, GUO Zhengyu, LUO Delin, MIAO Kehua. Collaborative air combat maneuvering decision-making method based on graph convolutional deep reinforcement learning[J]. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2023.09.25.004

Collaborative air combat maneuvering decision-making method based on graph convolutional deep reinforcement learning

  • The effective implementation of multi-unmanned aerial vehicle (UAV) decision making and improvement in the efficiency of coordinated mission execution are currently the top priorities of air combat research. To solve the problem of multi-UAV cooperative air combat maneuvering confrontation, a multi-UAV cooperative air combat maneuvering confrontation decision-making method based on long short-term memory (LSTM) and convolutional deep reinforcement learning of competitive graphs is proposed. First, the problem of multi-UAV cooperative air combat maneuvering confrontation is described. Second, in the deep dueling Q network, the LSTM network is introduced to process air combat information with a strong temporal correlation. Further, a graph convolutional network is built as a communication basis between multiple UAVs and a cooperative air combat training framework based on LSTM, and a convolutional deep reinforcement learning algorithm for the dueling graph is proposed to improve the convergence. In the proposed method, the communication problem between UAVs is transformed into a graph model, where each UAV is regarded as a node, and the observation state of the UAV is regarded as the attribute of a node. The convolutional layer captures the cooperative relationship between each node, and communication between UAVs is realized through information sharing. Subsequently, the extracted air combat feature information with time sequence is inputted into the LSTM and deep dueling Q networks for evaluating action values. The LSTM network can process sequence information and encode historical states into the hidden state of the network so that the network can better capture temporal dependencies and thus predict the value function of the current state better. The simulation results show that when the opponent adopts a nonmaneuvering strategy, the UAV formation developed using the proposed method as the core decision-making strategy can learn a reasonable maneuvering strategy and cooperate to a certain extent when facing an opponent using a fixed strategy. This proves the effectiveness of the algorithm in multi-UAV collaborative air combat maneuvering confrontation problems, enabling UAV formations to achieve teamwork and improve air combat efficiency. In a two-on-one air combat situation, the greedy algorithm is used as the decision-making strategy of the enemy aircraft. The results of simulation comparison experiments show that when faced with opponents using certain rules and strategies, the red team formation can learn reasonable maneuver confrontation strategies and cooperate in the decision-making process to form certain air combat tactics, which improve the combat efficiency of the red team. Compared with the basic method, the proposed method exhibits a more stable learning process and faster decision-making speed for UAV cooperative air combat.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return