基于深度强化学习的无人车高精度轨迹跟踪控制

High precision trajectory tracking control of unmanned vehicles based on deep reinforcement learning

  • 摘要: 为解决无人车轨迹跟踪中动态适应性弱与精度不足的问题,本文通过将无人车轨迹跟踪问题转化为马尔可夫决策过程并设计强化学习的状态空间、动作空间及奖励函数,提出一种基于深度强化学习的无人车高精度轨迹跟踪控制方法. 首先为增强系统对误差变化率的响应能力,在状态空间设计中引入横向位置误差微分补偿与航向角误差微分补偿. 然后针对传统奖励机制难以兼顾精准奖惩与动态适配的缺陷,提出双机制奖励函数协同策略:基于平滑阶跃函数的区域化奖惩机制与基于高斯核函数的自适应权重奖励机制. 最后通过仿真验证所提方法的有效性. 研究结果表明:所提出的深度强化学习轨迹跟踪控制方法在多种工况下均能够保持良好的跟踪精度与系统稳定性. 相较于传统控制方法以及原始强化学习算法,改进方法在轨迹跟踪精度、动态响应能力及控制平稳性等方面均表现出更优的综合性能;在随机噪声扰动条件下,仍能够维持稳定的控制效果,验证了所提方法在复杂工况下的鲁棒性与适应性.

     

    Abstract: Unmanned vehicle trajectory tracking, a core technology in the field of autonomous driving, provides crucial support for the precise and safe driving of unmanned vehicles. It plays an indispensable role in numerous practical scenarios such as logistics and intelligent transportation. In complex dynamic environments, traditional trajectory tracking methods generally encounter challenges in meeting the application requirements of high precision and reliability owing to their weak dynamic adaptability and insufficient accuracy. To address the challenges associated with weak dynamic adaptability and low accuracy in unmanned vehicle trajectory tracking, this study transformed the unmanned vehicle trajectory tracking problem into a Markov decision process (MDP); designed the state space, action space, and reward function for reinforcement learning; and developed a high-precision trajectory tracking control method for unmanned vehicles based on deep reinforcement learning. The proposed method was implemented and validated using the deep deterministic policy gradient (DDPG) and twin delayed deep deterministic policy gradient (TD3) algorithms, which are suitable for continuous control tasks. First, to enhance the responsiveness of the system to error change rates, differential compensation components for lateral position and heading angle errors were introduced into the state-space design. This enables the agent to perceive the error change trend more accurately during the trajectory tracking process and make control adjustments in advance. Second, a dual-mechanism reward function coordination strategy was proposed to address the difficulty of traditional reward mechanisms in balancing precise rewards and punishments with dynamic adaptability. It is a regionalized reward and punishment mechanism based on a smooth step function. Based on the positional relationship between the unmanned vehicle and the desired trajectory, different reward regions were created, and differentiated rewards and punishments were implemented for the unmanned vehicle in different regions to achieve precise rewards and punishments for the trajectory tracking state. In addition, it is an adaptive weight-reward mechanism based on a Gaussian kernel function. The Gaussian kernel function was used to weigh factors such as errors, allowing the reward function to dynamically adjust the reward weights according to the actual tracking situation and better adapt to different trajectory tracking scenarios. Finally, comprehensive simulation experiments were conducted to validate the effectiveness of the proposed method. Comparative studies with traditional linear quadratic regulator (LQR) control and the original DDPG and TD3 algorithms demonstrated that the proposed approach achieved superior tracking accuracy, smoother control actions, and improved dynamic response under various trajectory tracking scenarios, including straight-line and sinusoidal trajectories. Furthermore, robustness experiments under random noise disturbances indicated that the proposed method could maintain stable control performance and reliable tracking behavior, highlighting its strong robustness and adaptability in complex and uncertain environments. Overall, the results confirmed that the proposed deep reinforcement learning-based trajectory tracking control method effectively balances the dynamic responsiveness and steady-state precision. By jointly improving the state-space representation and reward function design, this method provides a robust and high-precision solution for unmanned vehicle trajectory tracking under complex dynamic conditions.

     

/

返回文章
返回