面向未知动态环境的多智能体分布式覆盖与避障协同控制算法

Distributed Multi-Agent Cooperative Control Algorithm for Coverage and Obstacle Avoidance in Unknown Dynamic Environments

  • 摘要: 针对未知动态环境下多智能体系统的快速搜索场景,本文提出了一种融合强化学习与模型预测控制的一体化分布式控制架构,旨在使集群在时变障碍、不完整先验地图与有限机载感知能力的约束下实现安全且高效的覆盖任务. 传统基于模型的覆盖控制器往往依赖于精确的环境模型,而强化学习策略在面对未知的动态障碍物时可能产生危险指令. 为解决上述问题,本文首先采用多智能体深度确定性策略梯度算法学习集群的协同覆盖策略,并输出智能体的覆盖控制量;其次,引入结合控制障碍函数的模型预测控制器,对覆盖控制量进行优化使其满足安全约束;最后,设计动态权重融合机制,实现智能体在全域扫描、动态避障、应急制动三种模式的自适应切换,以平衡集群的覆盖效率与安全保障. 实验结果表明,所提算法在覆盖率和避撞成功率方面均优于现有基线算法,进一步地,半实物仿真实验也验证了该方法在真实场景中的可行性与扩展性.

     

    Abstract: This study addresses the problem of rapid multi-agent search and coverage in unknown, dynamic environments by proposing an integrated distributed control architecture that combines reinforcement learning (RL) with model predictive control (MPC). In such environments, agents must explore quickly while facing time-varying obstacles, incomplete prior maps, and limited onboard sensing—challenges that complicate the simultaneous assurance of efficiency and safety. Conventional model-based coverage controllers depend on accurate environmental models, while purely learning-based policies may issue unsafe commands when faced with unseen dynamics. The objective is to enable the swarm to perform coverage missions safely and efficiently under uncertainty. First, the proposed architecture employs the multi-agent deep deterministic policy gradient (MADDPG) algorithm to learn a cooperative coverage policy that generates nominal coverage actions for each agent. During training, a centralized critic utilizes joint observations and actions to stabilize learning, while trained actors are deployed in a decentralized manner. At execution time, each agent computes its command based on its own observation via the learned actor, eliminating the need for a centralized planner. The coverage reward encourages the rapid exploration of uncovered regions, coordinated behavior to minimize redundant overlaps, and smooth control efforts, while penalties discourage collisions and excessive mode switching. Then, an MPC module incorporating control barrier functions (CBFs) is introduced to refine the nominal coverage commands, ensuring compliance with safety constraints. Safety constraints are encoded via CBF inequalities that enforce obstacle avoidance, interagent separation, and optional workspace boundaries, while the MPC further accounts for actuator limits and short-horizon dynamics. At each time step, the controller solves a constrained optimization problem to find the closest safe action to the RL–generated command to maintain coverage performance when the risk level is low. This separation allows the learning module to focus on coverage behavior, while the MPC–CBF module provides online safety certification without retraining the policy. Finally, a dynamic weighting fusion mechanism is designed to enable agents to adaptively switch between three operational modes: full-domain coverage, dynamic avoidance, and emergency braking, to balance coverage efficiency with safety assurance. Specifically, fusion weights are computed from predictive risk indicators such as the minimum forecast distance to obstacles and neighboring agents, and the feasibility margin of the CBF constraints. When the risk is low, the RL action dominates. As risk increases, the MPC–CBF output gains influence. In imminent-collision scenarios, an emergency braking command is triggered, which avoids hard switching and improves robustness. We evaluated the framework using the dynamic-obstacle avoidance success rate as the primary metric and employed two ablation studies to quantify the contribution of each component. Experimental results in scenarios with moving obstacles show that the complete framework achieves a higher obstacle-avoidance success rate than two ablated variants, one without the MPC–CBF module and another without the dynamic weighting fusion mechanism. These results underscore the roles of online safety certification and adaptive-mode fusion in ensuring robust collision avoidance in unknown, dynamic environments. In addition, semi-physical experiments confirmed the feasibility and scalability of the proposed method in real-world scenarios. In summary, the proposed RL–MPC integrated architecture offers a practical and extensible solution for safety-critical distributed coverage and rapid searches in previously unseen, dynamic environments.

     

/

返回文章
返回