动态稀疏采样驱动的污水处理快速策略迭代ADP控制

吴涛涛; 姚博文; 丁蓉蓉; 张恒; 徐畅

doi:10.13374/j.issn2095-9389.2026.02.10.003

动态稀疏采样驱动的污水处理快速策略迭代ADP控制

Fast Policy Iteration ADP Control for Wastewater Treatment Driven by Dynamic Sparse Sampling

摘要

摘要: 随着强化学习与最优控制的深度融合和发展，自适应动态规划(adaptive dynamic programming，ADP)这一智能控制方法，在保障污水处理控制系统的有效稳定运行中起着至关重要的作用。然而，经典 ADP 底层迭代算法需依赖全局状态空间信息，并在状态空间内执行密集采样操作，用于污水处理控制时存在计算开销较大、收敛速率偏低等客观特征。对此，本文引入动态稀疏采样法，提出一种基于快速策略迭代的污水处理最优控制算法。通过动态采样当前轨迹上的敏感关键污水状态点，训练神经网络快速筛选出初始容许控制，进而驱动对评价网络和控制网络的迭代训练，快速获得污水处理最优控制律。基于污水处理的低维控制模型和高维BSM1国际基准仿真模型，对该算法与全局、局部策略迭代算法以及传统PID方法，进行了系统对比实验，验证了该算法在保持低计算开销的同时，具备良好的有效性与较快的收敛速度。结果表明，该算法为污水处理的快速智能控制提供了潜在可行方案。

Abstract: With the in-depth integration and development of reinforcement learning and optimal control, adaptive dynamic programming (ADP), an intelligent control method, plays a crucial role in ensuring the effective and stable operation of wastewater treatment control systems. However, the underlying iterative algorithm of classical ADP needs to rely on global state space information and perform dense sampling operations within the state space. When applied to wastewater treatment control, it exhibits objective characteristics such as large computational overhead and low convergence rate. To solve this problem, this paper introduces a dynamic sparse sampling method and proposes an optimal control algorithm for wastewater treatment based on fast policy iteration. Firstly, by capturing key sensitive information, the algorithm avoids unnecessary calculations in regions far from the optimal or suboptimal trajectories, thereby accelerating the screening process of candidate control laws. Specifically, for randomly generated candidate control laws, sampling is performed on the wastewater state trajectories driven by them; the performance index function initialized with a positive semi-definite function is trained and updated to obtain a sequence of iterative performance index functions. When the sequence is detected to meet the convergence condition, the initial admissible control is obtained. Based on this, the alternating iterative improvement process of policy evaluation and policy improvement is initiated. At this time, the dynamic sparse sampling method is also adopted to replace the traditional dense sampling in the state space, that is, only the key sensitive points on the wastewater state trajectories driven by the current admissible control law are selected as sample points. Meanwhile, the approximate Bellman equation is used to calculate the corresponding target values, providing sparse but key effective data for the training of the critic network and the actor network, thereby accelerating the training and convergence process of the value function and the optimal control function in the region near the key optimal trajectories. To provide a benchmark for comparison, this paper selects the traditional PID control method, the global policy iteration algorithm and the local policy iteration algorithm, and evaluates the control effect and computational performance of the proposed algorithm based on the low-dimensional control model of wastewater treatment and the high-dimensional international benchmark simulation model BSM1, respectively. The experimental results based on the low-dimensional control model show that, the proposed algorithm can achieve almost the same control effect as the global and local policy iteration algorithms, and all of them have better control accuracy, faster convergence speed and stronger robustness compared with the traditional PID method. At the same time, compared with the global and local policy iteration algorithms, the proposed algorithm significantly reduces the computational burden, improves the computational convergence speed and shortens the total iteration time. The experimental results of the high-dimensional international benchmark simulation model BSM1 show that, both the proposed algorithm and the global and local policy iteration algorithms can achieve almost the same control effect as that of the low-dimensional wastewater treatment model under conditions closer to the real environment. The two-stage systematic comparison experiments verify the effectiveness, low computational cost, and fast convergence of the proposed algorithm. The results show that the algorithm provides a potentially feasible solution for the fast and intelligent control of wastewater treatment.

HTML全文

参考文献(0)

施引文献

资源附件(0)