Abstract:
With the in-depth integration and development of reinforcement learning and optimal control, adaptive dynamic programming (ADP), an intelligent control method, plays a crucial role in ensuring the effective and stable operation of wastewater treatment control systems. However, the underlying iterative algorithm of classical ADP needs to rely on global state space information and perform dense sampling operations within the state space. When applied to wastewater treatment control, it exhibits objective characteristics such as large computational overhead and low convergence rate. To solve this problem, this paper introduces a dynamic sparse sampling method and proposes an optimal control algorithm for wastewater treatment based on fast policy iteration. Firstly, by capturing key sensitive information, the algorithm avoids unnecessary calculations in regions far from the optimal or suboptimal trajectories, thereby accelerating the screening process of candidate control laws. Specifically, for randomly generated candidate control laws, sampling is performed on the wastewater state trajectories driven by them; the performance index function initialized with a positive semi-definite function is trained and updated to obtain a sequence of iterative performance index functions. When the sequence is detected to meet the convergence condition, the initial admissible control is obtained. Based on this, the alternating iterative improvement process of policy evaluation and policy improvement is initiated. At this time, the dynamic sparse sampling method is also adopted to replace the traditional dense sampling in the state space, that is, only the key sensitive points on the wastewater state trajectories driven by the current admissible control law are selected as sample points. Meanwhile, the approximate Bellman equation is used to calculate the corresponding target values, providing sparse but key effective data for the training of the critic network and the actor network, thereby accelerating the training and convergence process of the value function and the optimal control function in the region near the key optimal trajectories. To provide a benchmark for comparison, this paper selects the traditional PID control method, the global policy iteration algorithm and the local policy iteration algorithm, and evaluates the control effect and computational performance of the proposed algorithm based on the low-dimensional control model of wastewater treatment and the high-dimensional international benchmark simulation model BSM1, respectively. The experimental results based on the low-dimensional control model show that, the proposed algorithm can achieve almost the same control effect as the global and local policy iteration algorithms, and all of them have better control accuracy, faster convergence speed and stronger robustness compared with the traditional PID method. At the same time, compared with the global and local policy iteration algorithms, the proposed algorithm significantly reduces the computational burden, improves the computational convergence speed and shortens the total iteration time. The experimental results of the high-dimensional international benchmark simulation model BSM1 show that, both the proposed algorithm and the global and local policy iteration algorithms can achieve almost the same control effect as that of the low-dimensional wastewater treatment model under conditions closer to the real environment. The two-stage systematic comparison experiments verify the effectiveness, low computational cost, and fast convergence of the proposed algorithm. The results show that the algorithm provides a potentially feasible solution for the fast and intelligent control of wastewater treatment.