基于多模态信息融合的四足机器人避障方法

吕友豪; 贾袁骏; 庄圆; 董琦

doi:10.13374/j.issn2095-9389.2023.07.01.002

摘要: 提出了一种全新的基于多模态信息融合技术的四足机器人避障方法. 该方法将机器人的本体传感器信息与外部传感器信息相结合，以提高机器人在复杂环境下的决策能力. 具体而言，该方法设计了一种多模态强化学习神经网络模型，使机器人能够从自身的传感数据和外部传感器数据中学习. 该模型采用监督和非监督学习技术相结合的方法进行训练，以优化机器人在避障任务中的表现. 此外，还创新地引入了Transformer层和注意力机制，使机器人能够有选择地关注相关的传感信息并过滤掉无关信息，提高在未知动态环境中的规划可靠性. 该方法在具有不同障碍物、不平坦地形等具有挑战性的模拟环境中进行了评估. 实验结果表明，所提出的方法相较于对照组可以显著提高四足机器人的避障成功率. 此外，由于引入了注意力机制，所提出的算法在动态未知环境下也具有一定的可靠性，使其在实际应用中更加实用. 本文的意义主要在于引入多模态信息融合技术和Transformer层，以提高机器人在避障任务中的表现. 通过仿真环境的实验结果显示，该学习策略能够显著改善机器人的运动控制能力，并且多模态Transformer模型进一步增强了其性能使其具备优越的泛化性. 此外，进一步的分析和可视化也表明了学习策略利用外部输入进行决策的有效性.

Abstract: This paper proposes a multimodal information fusion neural network model that integrates visual, radar, and proprioceptive information. The model uses a spatial crossmodal attention mechanism to fuse the information, allowing the robot to focus on the most relevant information for obstacle avoidance. The attention mechanism enables the robot to selectively focus on the most relevant informative sensory inputs, which improves its ability to navigate complex terrain. The proposed method was evaluated using multiple experiments in challenging simulated environments, and the results showed a significant improvement in the obstacle avoidance success rate. The proposed method uses an actor–critic architecture and a proximal policy optimization (PPO) algorithm to train the robot in a simulated environment. The training process aims to reduce the difference between the robot’s performance in simulated and real-world environments. To achieve this, we randomly adjust the simulation environment’s parameters and add random noise to the robot’s sensory inputs. This approach allows the robot to learn a robust planning strategy that can be deployed in real-world environments. The multimodal information fusion neural network model is designed using a transformer-based architecture. The model shares the encoding of three types of tokens and generates features for the robot’s proprioceptive, visual, and point cloud inputs. The transformer encoder layers are stacked such that the token information from the three modalities can be fuzed at multiple levels. To balance the information from the three modalities, we first separately collect information for each modality and calculate the average value of all tokens from the same modality to obtain a single feature vector. This multimodal information fusion approach improves the robot’s decision-making capabilities in complex environments. The novelty of the proposed method lies in the introduction of a spatial crossmodal attention mechanism that allows the robot to selectively attend to the most informative sensory inputs. This attention mechanism improves the robot’s ability to navigate complex terrain and provides a certain degree of reliability for the quadruped robot in dynamic unknown environments. The combination of multimodal information fusion and attention mechanism enables the robot to adapt better to complex environments, thus improving its obstacle avoidance capabilities. Therefore, the proposed method provides a promising approach for improving the obstacle avoidance capabilities of quadruped robots in complex environments. The proposed method is based on the multimodal information fusion neural network model and spatial crossmodal attention mechanism. The experimental results demonstrate the effectiveness of the proposed method in improving the robot’s obstacle avoidance success rate. Moreover, the potential applications of the proposed method include search and rescue missions, exploration, and surveillance in complex environments.

基于多模态信息融合的四足机器人避障方法

Obstacle avoidance approach for quadruped robot based on multi-modal information fusion