基于深度强化学习算法的纯电动矿用汽车再生制动策略研究

杨威威; 罗登昊; 张文明

doi:10.13374/j.issn2095-9389.2023.06.01.003

基于深度强化学习算法的纯电动矿用汽车再生制动策略研究

Regenerative braking strategy based on deep reinforcement learning for an electric mining truck

摘要

摘要: 以载重50 t纯电动矿用汽车为研究对象，提出了一种基于深度强化学习优化算法的再生制动回馈策略. 首先建立了纯电动矿用自卸车的数学模型. 随后提出了一种考虑载重和坡度变化的基于自动熵调节Soft actor-critic (SAC)和深度确定性策略梯度算法（DDPG）的能量管理策略. 其中车速、加速度、车辆质量与道路坡度、动力电池荷电状态（SOC)及充放电倍率作为状态变量；变速箱挡位作为动作变量；动力电池SOC及电池寿命作为奖励函数. 仿真结果表明，基于动态规划的控制策略和所提出的基于SAC算法与基于DDPG算法的优化控制策略回馈效率分别提高了18.15%、17.18%和16.63%，电池寿命分别提升了57.31%、56.87%和57.38%. 最后通过比较两种基于深度强化学习算法策略的奖励曲线，可以看出与基于DDPG算法的控制策略相比，所提出的基于SAC的能量管理控制策略的收敛速度提升了166.7%.

Abstract: With the promotion of national “carbon neutral” and “green mine” strategies, pure electric mining vehicles are crucial in promoting energy conservation and emission reduction in the mining industry. However, “mileage anxiety” is the primary problem limiting their promotion and application. Regenerative braking is an essential technology for improving energy efficiency and reducing the life-cycle costs of pure electric vehicles. However, because of harsh driving conditions and substantial changes in load capacity and road slope, the scale and fluctuation characteristics of energy demand vary sharply during operation, affecting the feedback efficiency and battery life of an electric mining dump truck. Therefore, designing reasonable regenerative braking strategies for pure electric mining dump trucks is crucial. This paper uses a 50-ton pure electric mining truck as the research object and proposes a regenerative braking feedback strategy based on the deep reinforcement learning optimization algorithm. First, a mathematical model of a pure electric mining dump truck was established, which included a permanent magnet synchronous motor, power battery, four-speed automated mechanical transmission, and vehicle longitudinal dynamic model. Furthermore, power performance verification based on the Matlab/Simulink simulation platform was performed. Subsequently, an energy management strategy was proposed based on the soft actor–critic (SAC) algorithm and the deep deterministic strategy gradient (DDPG) deep reinforcement learning algorithm considering load and slope changes. In particular, the state variables include vehicle speed, acceleration, vehicle mass, road slope, battery state of charge (SOC), and battery charge–discharge rate. The transmission gear is selected as the action variable of the proposed strategy. Battery SOC and battery lifetime are used as reward functions. Furthermore, an automatic entropy adjustment mechanism is introduced to improve the adaptability of the proposed control strategy to different operating conditions. Simulation results show that compared to the rule-based control strategy, the energy efficiency of the control strategy based on dynamic programming and the proposed optimization control strategy based on the SAC and DDPG algorithms are improved by 18.15%, 17.18%, and 16.63%, respectively, and the battery lifetime is improved by 57.31%, 56.87%, and 57.38%, respect ively. Finally, the proposed energy management strategy is compared with the control strategy based on DDPG to further verify its superiority by comparing the reward curves. The results demonstrate the feasibility of the proposed control strategy based on the SAC algorithm, which has improved convergence speed by 166.7%.

HTML全文

参考文献(31)

施引文献

资源附件(0)