基于自适应实例分割的层级化跨源点云配准方法

张梦叶; 汪墨涵; 韩义恒; 马楠

doi:10.13374/j.issn2095-9389.2025.07.08.001

基于自适应实例分割的层级化跨源点云配准方法

AIS-HCSR: A hierarchical cross-source point cloud registration method with adaptive instance segmentation

摘要

摘要: 移动智能体在运动过程中通过跨源点云配准融合不同传感器数据以获取高精度的位姿信息，过程中会面临模态间密度差异、视场重叠度低等挑战. 针对传统优化或深度学习方法在复杂多物体环境中难以兼顾全局一致性与局部精度的问题，提出基于自适应实例分割的层级化配准方法（AIS-HCSR）. 该方法构建了三层级渐进框架：场景级通过自适应几何特征编码融合距离与角度特征实现初始匹配；物体级利用自适应欧式聚类分割点云实例，结合匹配传播与空间布局验证消除位置歧义；点云级通过点面残差优化与全局模态融合完成精细配准. 实验在3DCSR数据集上的结果显示，该方法召回率优于当前最优方法5.45个百分点，尤其在多相似物体的复杂场景中表现优异；同时在移动智能体系统（如SLAM）的典型应用中，可在关键帧/回环或漂移纠偏阶段提供鲁棒的全局约束，为跨源点云配准在工程场景中的部署提供了有效方案.

Abstract: Cross-source point cloud registration is a key technique for multisensor fusion in mobile robotics and intelligent perception, where point clouds acquired from heterogeneous modalities (e.g., LiDAR, depth cameras, and structure-from-motion reconstructions) must be aligned into a unified coordinate system to support reliable pose estimation and 3D scene understanding. In practical indoor and industrial environments, cross-source registration remains challenging owing to large differences in sampling density and spatial distribution, distinct noise patterns, limited field-of-view overlap, and the frequent presence of repeated structures or multiple similar objects. These factors make correspondence search highly ambiguous and cause existing optimization-based pipelines or learning-based methods to struggle in simultaneously achieving global consistency and high local accuracy, particularly in complex multi-object scenes. To address these issues, this paper proposes adaptive instance segmentation hierarchical cross-source registration (AIS-HCSR), a hierarchical cross-source point cloud registration method based on adaptive instance segmentation, which performs progressive registration from the scene level to the object level and finally to the point cloud level. At the scene level, an adaptive geometric feature encoding scheme is designed. This scheme jointly models pairwise distance relations and triplet angle relations and dynamically reweights the two types of geometric embeddings according to local geometric complexity. The resulting geometric structural embedding is injected into a transformer-based geometric perception network to compute the self- and cross-attention to enable robust feature extraction and initial matching across modalities. Based on the optimized correlation matrix, the top-ranked correspondences are selected to estimate an initial rigid transformation, providing a globally consistent prior for subsequent refinement. At the object level, a density-aware adaptive Euclidean clustering algorithm is introduced to segment each point cloud into instances with explicit physical meaning. An instance correspondence mechanism is then constructed by propagating superpoint matches to instance pairs, computing matching frequency and fusing it with centroid-based positional similarity to form an instance similarity matrix. The optimal instance associations are obtained by solving a bipartite matching problem. To further suppress ambiguities caused by multiple similar objects and improve robustness under imperfect clustering (e.g., over-segmentation or under-segmentation in contact scenarios), a spatial layout consistency verification strategy is proposed. The strategy evaluates triangle-based configurations of instances and filters out instance correspondences that violate global spatial relations to prevent incorrect matches from propagating to later optimization. At the point cloud level, for each matched instance pair, point-to-plane residual minimization is performed to obtain a locally refined transformation with improved convergence under cross-source density gaps. The set of local transformations is then integrated through a global least-squares formulation and solved efficiently to yield a single final transformation, achieving fine registration while preserving global consistency across instances. Experiments on the 3DCSR dataset (including LiDAR–Kinect and Kinect–SfM modality pairs) demonstrate that AIS-HCSR achieves a recall of 81.19%, outperforming the previous state-of-the-art FF-LOGO by 5.45 percentage points, with translation and rotation errors of 0.08 m and 2.42°, respectively. The average end-to-end runtime is 1.33 s per point cloud pair. Ablation studies further verify that the scene-level registration and object-level matching with local-to-global optimization are complementary and jointly contribute to the performance gain. Overall, AIS-HCSR improves registration robustness in low-overlap and large density-difference settings by explicitly combining scene-scale structural priors with instance-scale geometric constraints to provide an effective solution for cross-source point cloud registration in complex multi-object environments.

HTML全文

参考文献(34)

施引文献

资源附件(0)