Abstract:
As a core technology in the field of autonomous driving, unmanned vehicle trajectory tracking serves as a crucial support for achieving precise and safe driving of unmanned vehicles. It plays an indispensable role in numerous practical scenarios such as logistics transportation and intelligent transportation. In complex dynamic environments, traditional trajectory tracking methods often struggle to meet the application requirements of high precision and high reliability due to their weak dynamic adaptability and insufficient accuracy. To address the issues of weak dynamic adaptability and low accuracy in unmanned vehicle trajectory tracking, this paper converts the unmanned vehicle trajectory tracking problem into a Markov Decision Process (MDP), designs the state space, action space, and reward function for reinforcement learning, and proposes a high-precision trajectory tracking control method for unmanned vehicles based on deep reinforcement learning. Firstly, to enhance the system's responsiveness to error change rates, lateral position error differential compensation and heading angle error differential compensation are introduced into the state space design. This enables the agent to more acutely perceive the error change trend during the trajectory tracking process and make control adjustments in advance. Secondly, aiming at the defect that traditional reward mechanisms are difficult to balance precise reward and punishment with dynamic adaptation, a dual-mechanism reward function coordination strategy is proposed. On one hand, it is a regionalized reward and punishment mechanism based on a smooth step function. According to the positional relationship between the unmanned vehicle and the desired trajectory, different reward regions are divided, and differentiated rewards and punishments are implemented for the unmanned vehicle in different regions to achieve precise reward and punishment for the trajectory tracking state. On the other hand, it is an adaptive weight reward mechanism based on a Gaussian kernel function. The Gaussian kernel function is used to weight factors such as errors, allowing the reward function to dynamically adjust the reward weights according to the actual tracking situation and better adapt to different trajectory tracking scenarios. Finally, the effectiveness of the proposed method is verified through simulations. The research results show that: in straight-line trajectory tracking, the improved algorithm has a faster correction speed for initial deviations and more rapid convergence, enabling the unmanned vehicle to return to the desired trajectory in a shorter time; in sinusoidal trajectory tracking, it has a higher degree of fitting to feature points such as wave crests and wave troughs. Its tracking accuracy and dynamic adaptability are significantly superior to those of the original Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms.