基于深度学习的移动机器人同时定位与建图研究综述

李擎; 邵圣; 马靖超; 王恒; 曾慧

doi:10.13374/j.issn2095-9389.2025.05.18.002

摘要: 近年来，深度学习技术在移动机器人同时定位与建图（Simultaneous localization and mapping，SLAM）领域取得了显著进展，为解决传统视觉SLAM在动态环境下面临的挑战提供了新的思路. 本文首先总结了传统视觉SLAM在预处理、视觉里程计以及闭环检测模块的局限性. 随后，聚焦于深度学习在视觉SLAM中的应用，重点介绍了基于深度学习的预处理、视觉里程计和闭环检测模块，以及其如何提升视觉SLAM的鲁棒性和精度. 最后，探讨了基于深度学习SLAM面临的挑战并展望了未来研究方向，包括轻量化网络设计、场景的长期建模以及自监督学习等，以推动深度学习SLAM在实际应用中的落地.

Abstract: Driven by the new round of global industrial revolution, the deep integration of new information technology and manufacturing production has extended the application of mobile robots. Simultaneous localization and mapping (SLAM) is one of the core technologies for the autonomous navigation of mobile robots, and its accuracy directly affects the application of mobile robots in the scene. Most mobile robot applications involve dynamic scenes, and the accuracy of traditional visual SLAM in dynamic environment localization and mapping cannot meet the actual demand due to the limitation of static assumptions. The core of deep learning technology is the autonomous learning of features and patterns from data using multilayer neural networks. By mimicking the hierarchical information processing mechanism of the human brain, it uses multilayer nonlinear transformations to extract high-level abstract features of the data step by step to effectively model the potential distribution of complex data modalities, such as temporal signals, spatial structures, and semantic relationships. Recently, deep learning techniques have made significant progress in the field of SLAM for mobile robots, providing new ideas to address the challenges faced by traditional visual SLAM in dynamic environments. This review first summarizes the limitations of traditional visual SLAM in terms of preprocessing, visual odometry, and loop-closure detection modules, such as sensitivity to light changes and texture-deficient scenes. We then focus on the application of deep learning in visual SLAM, highlighting deep-learning-based preprocessing, visual odometry, and loop-closure detection modules, and how to improve the robustness and accuracy of visual SLAM. Among these are the latest large models, embodied intelligence, and multimodal fusion approaches. We also identify areas for future optimization following an in-depth analysis. Prospects for subsequent research directions are outlined by comparing the latest research methodologies. Neural radiance fields (NeRFs) and 3D Gaussian splatting are deep-learning-based computer vision techniques that reconstruct continuous 3D scene models from multiview 2D images through implicit neural representations. Mobile robot navigation technology cannot be conducted smoothly without high-precision semantic maps. To ensure mobile robots can construct high-precision semantic maps in dynamic environments, the latest NeRF- and GS-based methods are introduced in the preprocessing module. This study also introduces several multisensor inputs and end-to-end SLAM to enrich the methods. At the end of each module, the methods used are analyzed, summarizing their individual strengths and weaknesses as well as the directions in which they can be improved. Finally, we discuss the challenges faced by deep-learning-based SLAM and anticipate future research directions, including lightweight network design, long-term modeling of scenes, and self-supervised learning, to promote deep-learning SLAM in practical applications. In short, as deep learning technology becomes more mature, the development of large-model AI technology will also be rapid, and the robot’s understanding of the environment and interaction will be more diversified. That large-model AI technology will further improve the performance of mobile robots in dynamic environments for localization and mapping is a reasonable expectation.

基于深度学习的移动机器人同时定位与建图研究综述

Review of deep-learning-based mobile robot localization and mapping