基于动态优化细节感知网络的遥感图像分割方法

梁书绮; 王雷; 孙燕青; 杨善良; 李彬

doi:10.13374/j.issn2095-9389.2025.04.09.002

基于动态优化细节感知网络的遥感图像分割方法

Remote sensing image segmentation method based on dynamic optimized detail-aware network

摘要

摘要: 现有的遥感图像分割模型，例如基于卷积神经网络（Convolutional neural network，CNN）和基于Transformer框架的模型，取得了巨大成功，但是还存在难以完整保留原始编码器特征图细节、动态捕捉全局上下文信息等缺点. 因此，基于CNN–Transformer混合框架，提出了一种全新的基于动态优化细节感知网络（Dynamic optimized detail-aware network，DODNet）的分割方法. 首先，在编码器采用ResNext–50作为主干网络，提出一种多重减法感知模块（Multi-subtraction perception module，MSPM）来收集多尺度特征图之间的空间细节差异，有效减少冗余信息. 然后，在解码器设计一个动态信息融合模块（Dynamic information fusion block，DIFB），它结合了全局双层路由自注意力分支和局部注意力分支，用于提高全局和局部信息的获取能力. 最后，提出一种新的通道空间注意力模块—统一特征提取器（Unified feature extractor，UFE）以进一步获取语义和上下文信息. 在Vaihingen、Potsdam和LoveDA三个经典公开数据集，通过对比和消融实验的定量和可视化分析表明，所提方法在F1分数、总体精度（Over accuracy，OA）和平均交并比（Mean intersection over union，mIoU）评价指标中优于十种最先进的分割方法，其中平均交并比分别达到了84.96%、87.64%和52.43%，验证了所提方法在分割具有复杂背景、内类方差大和类间方差小问题的高分辨率遥感图像的优越性能.

Abstract: Semantic segmentation is an important technology for remote sensing image processing and has been widely applied in many fields. Although existing remote-sensing image segmentation models, such as convolutional neural network (CNN) and transformer-based segmentation methods, have achieved great success in this domain, there are still many disadvantages and challenges, such as the difficulty in fully preserving detailed feature maps by the original encoder and dynamically capturing global contextual information. To address these disadvantages and challenges, a novel remote-sensing image segmentation method called the dynamic optimized detail-aware network (DODNet) is proposed based on a CNN–transformer hybrid framework. First, a ResNext–50 network is employed as the backbone network at the encoder, and a multi-subtraction perception module (MSPM) is designed to collect spatial detail differences between multiscale feature maps to effectively reduce redundant information. This module integrates multidirectional depth-wise separable convolutions with parallel dilated convolutions to enhance the feature representation ability. By performing pixel-wise subtraction after upsampling and spatial alignment, different feature maps are generated to capture the significant variation regions, effectively preserving the boundary and other detailed information in the remote-sensing images, while improving the model's perception of small objects. Then, a dynamic information fusion block (DIFB), which combines global bi-level routing self-attention and local attention branches to improve the ability to obtain global and local information, is designed for the decoder. The global bi-level routing self-attention branch utilizes a learnable regional routing network to filter out low-association background areas and then performs a fine-grained attention calculation within the retained semantic key windows. This scheme effectively addresses the dual challenges of background interference and computational efficiency in remote-sensing image segmentation. The local attention branch compensates for the local information that is difficult to capture by the global bi-level routing self-attention branch by utilizing multiscale convolutions. Finally, a new channel-spatial attention module, the unified feature extractor (UFE), is proposed to obtain semantic and contextual information by serially fusing channel and spatial attention mechanisms. In the channel attention stage, combined with a one-dimensional depth-separable convolution to extract channel features, dual-path average pooling in width and height directions is used to replace traditional global pooling. Subsequently, a multiscale convolution fusion strategy is introduced in the spatial attention stage, and spatial attention weights are generated through instance normalization; thus, this module pays more attention to local features and foreground objects inside an image. To verify the effectiveness and accuracy of the proposed method, experiments and ablation tests were carefully designed and implemented on three open and typical datasets: Vaihingen, Potsdam, and LoveDA. By comparing the experimental results, quantitative and visual analyses showed that DODNet outperforms ten state-of-the-art segmentation methods in terms of the F1 score, over accuracy (OA), and mean intersection over union (mIoU). In particular, the mIoU values reached 84.96, 87.64, and 52.43%, respectively, verifying the strong ability of the proposed DODNet to deal with the segmentation problem with complex background interference, large intra-class differences, and obvious inter-class similarities.

HTML全文

参考文献(43)

施引文献

资源附件(0)