基于动态优化细节感知网络的遥感图像分割方法

Remote sensing image segmentation method based on dynamic optimized detail-aware network

  • 摘要: 现有的遥感图像分割模型, 例如基于卷积神经网络(Convolutional Neural Network, CNN)和基于Transformer框架的模型, 取得了巨大成功, 但是还存在难以完整保留原始编码器特征图细节、动态捕捉全局上下文信息等缺点.因此, 基于CNN-Transformer混合框架, 提出了一种全新的基于动态优化细节感知网络(Dynamic Optimized Detail-Aware Network, DODNet)的分割方法. 首先, 编码器采用ResNext-50作为主干网络, 设计一个多重减法感知模块(Multi-Subtraction Perception Module, MSPM)来收集多尺度特征图之间的空间细节差异, 有效减少冗余信息. 然后, 在解码器设计一个动态信息融合模块(Dynamic Information Fusion Block, DIFB), 它结合了全局双层路由自注意力分支和局部注意力分支, 用于提高全局和局部信息的获取能力. 最后, 提出一种新的通道空间注意力模块——统一特征提取器(Unified Feature Extractor, UFE)以进一步获取语义和上下文信息. 在Vaihingen和Potsdam两个经典公开数据集, 通过对比和消融实验的定量和可视化分析表明, 所提方法在F1分数、总体精度(Over Accuracy, OA)和平均交并比(Mean Intersection over Union, mIoU)评价指标中优于八种最先进的分割方法, 其中平均交并比分别达到了84.96%和87.64%, 验证了所提方法在分割具有复杂背景、内类方差大和类间方差小问题的高分辨率遥感图像的优越性能.

     

    Abstract: Semantic segmentation technology has important application value in the field of remote sensing image processing and has been widely used in many fields. However, the complexity of high-resolution remote sensing images is mainly reflected in the following aspects: complex background interference, large intra-class differences and obvious inter-class similarities, resulting in blurred target boundaries. At the same time, the scale of target objects in the image varies greatly (such as buildings, vegetation, roads, etc., with large size differences), which further exacerbates the challenge of the segmentation task. The existing remote sensing image segmentation models, such as those based on Convolutional Neural Networks (CNN) and Transformer frameworks, have achieved great success. However, they still face challenges such as difficulty in fully preserving the detailed feature maps of the original encoder and dynamically capturing global contextual information. Therefore, based on the CNN-Transformer hybrid framework, a novel segmentation method called Dynamic Optimized Detail-Aware Network (DODNet) is proposed. The ResNext-50 is firstly adopted as the backbone network at encoder and a multi-subtraction perception module (MSPM) is designed to collect the spatial detail differences between multi-scale feature maps, which efficiently reduces the redundant information. Then, a dynamic information fusion block (DIFB) is designed at decoder, which combines a global bi-level routing self-attention branch and a local attention branch. The global bi-level routing self-attention branch first utilizes a learnable regional routing network to filter out low-association background areas, and then performs fine-grained attention calculation within the retained semantic key windows. This effectively addresses the dual challenges of background interference and computational efficiency in remote sensing image processing, achieving efficient global modeling. The local attention branch compensates for the local information that is difficult to capture by the global bi-level routing self-attention branch by utilizing multi-scale convolutions. Finally, a new channel-spatial attention module——unified feature extractor (UFE) is proposed for further acquiring the semantic and contextual information. The quantitative and visual analyses based on the comparison and ablation experiments on the Vaihingen and Potsdam datasets show that DODNet outperforms eight state-of-the-art segmentation methods in terms of F1 score, OA and mIoU. Especially, the mIoU reaches 84.96% and 87.64%, which verifies the strong ability of the proposed DODNet in dealing with the segmentation problem with complex background interference, large intra-class differences and obvious inter-class similarities.

     

/

返回文章
返回