杨本臣, 王建宇, 金海波. 结合Swin Transformer和交互融合注意力的视网膜血管自动分割[J]. 工程科学学报. DOI: 10.13374/j.issn2095-9389.2023.06.27.004
引用本文: 杨本臣, 王建宇, 金海波. 结合Swin Transformer和交互融合注意力的视网膜血管自动分割[J]. 工程科学学报. DOI: 10.13374/j.issn2095-9389.2023.06.27.004
Combining Swin Transformer and interactive fusion attention for automatic vascular segmentation of retinal images[J]. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2023.06.27.004
Citation: Combining Swin Transformer and interactive fusion attention for automatic vascular segmentation of retinal images[J]. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2023.06.27.004

结合Swin Transformer和交互融合注意力的视网膜血管自动分割

Combining Swin Transformer and interactive fusion attention for automatic vascular segmentation of retinal images

  • 摘要: 准确的视网膜血管分割在辅助筛查各种疾病中具有重要意义。但现有方法仍存在无法建立长距离依赖关系和全局上下文连接、眼底血管分支末端与背景对比度较低等问题。针对这些问题,本文提出了一种新的视网膜血管分割框架,称之为Dual Swin Transformer Fusion(DS-TransFusion)。首先采用基于Swin-Transformer的双尺度编码器子网络来提取不同语义尺度的粗粒度和细粒度特征。其次,在跳跃连接处提出Transformer 交互融合注意力(TIFA)模块用于丰富的跨视图上下文建模和语义依赖,解决了捕获来自不同图像视图的数据之间的长期相关性的关键问题。最后在编码区和解码区之间提出多尺度注意力(MSA)来收集多尺度特征表示的全局对应关系。在公共数据集STARE、CHASEDB1和DRIVE上实验了所提出的方法。在实验中,受试者工作特征下的面积(ROC)和准确率(Acc)分别达到了98.3%/96.8%、98.5%/97.2%和98.3%/96.5%的优异表现。实验结果表明,在视网膜分割的准确度、灵敏度和特异性都有大幅提升,表现优于现有的先进方法。

     

    Abstract: Accurating retinal vessel segmentation is of great significance in assisting the screening of various diseases. However, existing methods still face challenges such as the inability to establish long-range dependencies and global contextual connections, as well as low contrast between retinal vessel branch terminals and the background. To address these issues, this paper proposes a new retinal vessel segmentation framework called Dual Swin Transformer Fusion (DS-TransFusion). Firstly, a dual-scale encoder sub-network based on the Swin Transformer is adopted to extract coarse-grained and fine-grained features at different semantic scales. Secondly, a Transformer Interaction Fusion Attention (TIFA) module is introduced at the skip connections to enhance cross-view context modeling and semantic dependencies, solving the key problem of capturing long-term correlations between data from different image views. Finally, a Multi-Scale Attention (MSA) is proposed between the encoding and decoding regions to gather global correspondences of multi-scale feature representations. The proposed method is evaluated on the public datasets STARE, CHASEDB1, and DRIVE. In the experiments, the area under the receiver operating characteristic curve (ROC) and accuracy (Acc) achieved excellent performances of 98.3%/96.8%, 98.5%/97.2%, and 98.3%/96.5% respectively. The experimental results demonstrate significant improvements in the accuracy, sensitivity, and specificity of retinal segmentation, outperforming existing state-of-the-art methods.

     

/

返回文章
返回