基于自适应动态规划的多飞行器编队控制方法

肖杰; 汤清璞; 李嘉乐; 李国飞

doi:10.13374/j.issn2095-9389.2024.06.04.002

摘要: 强化学习理论对解决非线性系统的最优控制问题具有显著优势. 就无人飞行器的最优编队而言，现有的强化学习策略迭代方法需要具体的动态方程信息，往往只能离线进行，而自适应动态规划方法在参数调节过程中的稳定性较差. 因此，亟需给出一种可在线执行，又能保证参数稳定调节的编队控制方法. 本文针对模型中存在未知非线性项的多飞行器系统，提出了可在线逼近最优的分布式编队控制方法. 结合强化学习策略迭代与自适应动态规划在线连续逼近，该编队控制方法继承了前者的稳定性和后者的在线调节能力，使得调节结果受瞬时强干扰的影响降低，理论分析证明了该方法在干扰作用下能够提升编队控制系统鲁棒性. 最后分别针对线性定常系统与非线性多飞行器系统仿真验证了所提方法的有效性.

Abstract: Unmanned aerial vehicles (UAVs) are widely used across various fields, including atmospheric research, emergency rescue, and cargo transportation, demonstrating their versatility and efficiency in both civil and military applications. Systems consisting of multiple UAVs offer even greater benefits, enabling coordinated surveillance, complex rescue missions, and strategic military maneuvers. However, controlling such systems poses considerable challenges because of their nonlinear dynamics, environmental interference, and uncertainties in system models. To address these challenges, advanced control methods are essential. Reinforcement learning theory has emerged as a powerful approach to solving optimal control problems in nonlinear systems. The theory’s ability to learn and adapt to dynamic environments makes it particularly well-suited for UAV control. However, existing policy iteration methods often rely on known system dynamics, which limits their application to offline scenarios. Although adaptive dynamic programming methods offer real-time tuning capabilities, they tend to experience stability issues during parameter adjustments. This issue highlights the need for a control method that balances real-time adaptability with robust stability during parameter tuning. In this study, we introduce a distributed formation control method designed for multivehicle systems with unknown nonlinear dynamics. The proposed method allows for the real-time approximation of optimal control by combining two powerful approaches: a reinforcement learning policy iteration scheme and an adaptive dynamic programming continuous approximation scheme. This combination allows the controller to retain the stability of reinforcement learning while benefiting from the real-time tuning flexibility of adaptive dynamic programming. A key innovation of this method is the use of neural networks to solve the general Hamilton–Jacobi–Bellman equations using a gradient descent method. The obtained solution is then used to update the controller parameters without interrupting the operations. The separation of parameter adjustments from the control signal generation ensures smooth and stable performance during real-time parameter adjustments. To further enhance the stability, the original parameter approximation scheme is modified by replacing instantaneous quantities with integral quantities to update the parameters. This modification reduces the sensitivity to outliers and transient disturbances, thereby making the regulation process more robust. Theoretical analysis shows that integral quantities are more effective in handling disturbances, particularly transient disturbances, by minimizing their influence on parameter tuning. Finally, the effectiveness of the updated parameter approximation scheme, along with the formation controller, is validated through tests on both linear and nonlinear systems. In the linear constant system, the final parameters converge to their optimal values owing to the completeness of the selected basis functions. The effectiveness of the integral volume regulation in mitigating interference is validated through simulations by comparing it with the original parameter regulation scheme. For complex nonlinear multivehicle systems, even with incomplete basis functions, the method’s efficacy is also demonstrated.

基于自适应动态规划的多飞行器编队控制方法

Formation control method for multiple flight vehicles based on adaptive dynamic programming