Abstract:
Unmanned aerial vehicles (UAVs) are widely used across various fields, including atmospheric research, emergency rescue, and cargo transportation, demonstrating their versatility and efficiency in both civil and military applications. Systems consisting of multiple UAVs offer even greater benefits, enabling coordinated surveillance, complex rescue missions, and strategic military maneuvers. However, controlling such systems poses considerable challenges because of their nonlinear dynamics, environmental interference, and uncertainties in system models. To address these challenges, advanced control methods are essential. Reinforcement learning theory has emerged as a powerful approach to solving optimal control problems in nonlinear systems. The theory’s ability to learn and adapt to dynamic environments makes it particularly well-suited for UAV control. However, existing policy iteration methods often rely on known system dynamics, which limits their application to offline scenarios. Although adaptive dynamic programming methods offer real-time tuning capabilities, they tend to experience stability issues during parameter adjustments. This issue highlights the need for a control method that balances real-time adaptability with robust stability during parameter tuning. In this study, we introduce a distributed formation control method designed for multivehicle systems with unknown nonlinear dynamics. The proposed method allows for the real-time approximation of optimal control by combining two powerful approaches: a reinforcement learning policy iteration scheme and an adaptive dynamic programming continuous approximation scheme. This combination allows the controller to retain the stability of reinforcement learning while benefiting from the real-time tuning flexibility of adaptive dynamic programming. A key innovation of this method is the use of neural networks to solve the general Hamilton–Jacobi–Bellman equations using a gradient descent method. The obtained solution is then used to update the controller parameters without interrupting the operations. The separation of parameter adjustments from the control signal generation ensures smooth and stable performance during real-time parameter adjustments. To further enhance the stability, the original parameter approximation scheme is modified by replacing instantaneous quantities with integral quantities to update the parameters. This modification reduces the sensitivity to outliers and transient disturbances, thereby making the regulation process more robust. Theoretical analysis shows that integral quantities are more effective in handling disturbances, particularly transient disturbances, by minimizing their influence on parameter tuning. Finally, the effectiveness of the updated parameter approximation scheme, along with the formation controller, is validated through tests on both linear and nonlinear systems. In the linear constant system, the final parameters converge to their optimal values owing to the completeness of the selected basis functions. The effectiveness of the integral volume regulation in mitigating interference is validated through simulations by comparing it with the original parameter regulation scheme. For complex nonlinear multivehicle systems, even with incomplete basis functions, the method’s efficacy is also demonstrated.