基于深度强化学习的无人车高精度轨迹跟踪控制

High precision trajectory tracking control of unmanned vehicles based on deep reinforcement learning

  • 摘要: 为解决无人车轨迹跟踪中动态适应性弱与精度不足的问题,本文通过将无人车轨迹跟踪问题转化为马尔可夫决策过程并设计强化学习的状态空间、动作空间及奖励函数,提出一种基于深度强化学习的无人车高精度轨迹跟踪控制方法。首先为增强系统对误差变化率的响应能力,在状态空间设计中引入横向位置误差微分补偿与航向角误差微分补偿。然后针对传统奖励机制难以兼顾精准奖惩与动态适配的缺陷,提出双机制奖励函数协同策略:基于平滑阶跃函数的区域化奖惩机制与基于高斯核函数的自适应权重奖励机制。最后通过仿真验证所提方法的有效性。研究结果表明:改进算法在直线轨迹跟踪中初始偏差修正更快、收敛更迅速;正弦轨迹跟踪时波峰与波谷等特征点贴合度更高,其跟踪精度及动态适应性显著优于原始深度确定性策略梯度算法和双延迟深度确定性策略梯度算法。

     

    Abstract: As a core technology in the field of autonomous driving, unmanned vehicle trajectory tracking serves as a crucial support for achieving precise and safe driving of unmanned vehicles. It plays an indispensable role in numerous practical scenarios such as logistics transportation and intelligent transportation. In complex dynamic environments, traditional trajectory tracking methods often struggle to meet the application requirements of high precision and high reliability due to their weak dynamic adaptability and insufficient accuracy. To address the issues of weak dynamic adaptability and low accuracy in unmanned vehicle trajectory tracking, this paper converts the unmanned vehicle trajectory tracking problem into a Markov Decision Process (MDP), designs the state space, action space, and reward function for reinforcement learning, and proposes a high-precision trajectory tracking control method for unmanned vehicles based on deep reinforcement learning. Firstly, to enhance the system's responsiveness to error change rates, lateral position error differential compensation and heading angle error differential compensation are introduced into the state space design. This enables the agent to more acutely perceive the error change trend during the trajectory tracking process and make control adjustments in advance. Secondly, aiming at the defect that traditional reward mechanisms are difficult to balance precise reward and punishment with dynamic adaptation, a dual-mechanism reward function coordination strategy is proposed. On one hand, it is a regionalized reward and punishment mechanism based on a smooth step function. According to the positional relationship between the unmanned vehicle and the desired trajectory, different reward regions are divided, and differentiated rewards and punishments are implemented for the unmanned vehicle in different regions to achieve precise reward and punishment for the trajectory tracking state. On the other hand, it is an adaptive weight reward mechanism based on a Gaussian kernel function. The Gaussian kernel function is used to weight factors such as errors, allowing the reward function to dynamically adjust the reward weights according to the actual tracking situation and better adapt to different trajectory tracking scenarios. Finally, the effectiveness of the proposed method is verified through simulations. The research results show that: in straight-line trajectory tracking, the improved algorithm has a faster correction speed for initial deviations and more rapid convergence, enabling the unmanned vehicle to return to the desired trajectory in a shorter time; in sinusoidal trajectory tracking, it has a higher degree of fitting to feature points such as wave crests and wave troughs. Its tracking accuracy and dynamic adaptability are significantly superior to those of the original Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithms.

     

/

返回文章
返回