DS-TransFusion：基于改进Swin Transformer的视网膜血管自动分割

杨本臣; 王建宇; 金海波

doi:10.13374/j.issn2095-9389.2023.06.27.004

DS-TransFusion：基于改进Swin Transformer的视网膜血管自动分割

DS-TransFusion: Automatic retinal vessel segmentation based on an improved Swin Transformer

摘要

摘要: 视网膜血管的准确分割在辅助筛查各种疾病方面具有重大意义. 然而，当前流行的模型仍存在细小血管的分割不清晰，以及眼底血管分支末端与背景的对比度较低等问题. 针对这些问题，本文提出了一种全新的视网膜血管分割模型，命名为Dual Swin Transformer Fusion(DS-TransFusion). 首先，DS-TransFusion采用基于Swin Transformer的双尺度编码器子网络，以提取视网膜血管的粗粒度和细粒度特征. 其次，在跳跃连接处引入了Transformer交互融合注意力（TIFA）模块，用于丰富跨视图上下文建模和语义依赖，同时捕获来自不同图像视图的数据之间的长期相关性. 最后，在编码器和解码器之间，DS-TransFusion采用了多尺度注意力（MA），用于收集多尺度特征表示的全局对应关系，进一步优化模型的分割效果. 实验结果表明，DS-TransFusion在公共数据集STARE、CHASEDB1和DRIVE上表现出色，准确率分别达到了96.50%、97.22%和97.80%，灵敏度达到84.10%、84.55%和83.17%. 实验表明DS-TransFusion能有效提高视网膜血管分割的精度，准确分割出细小血管. 对视网膜血管分割的准确度、灵敏度和特异性都有大幅提高，与现有的SOTA方法相比具有更好的分割性能.

Abstract: Retinal vascular segmentation holds significant value in medical research, playing an indispensable role in facilitating the screening of various diseases, such as diabetes, hypertension, and glaucoma. However, most current retinal vessel segmentation methods mainly rely on convolutional neural networks, which present limitations when dealing with long-term dependencies and global context connections. These limitations often result in poor segmentation of small blood vessels and low contrast between the ends of fundus blood vessel branches and the background. Addressing these issues is a pressing concern. To tackle these challenges, this paper proposes a new retinal blood vessel segmentation model, namely Dual Swin Transformer Fusion (DS-TransFusion). This model uses a two-scale encoder subnetwork based on a Swin Transformer, which is able to find correspondence and align features from heterogeneous inputs. Given an input image of a retinal blood vessel, the model first splits it into two nonoverlapping blocks of different sizes. These are then fed into the two branches of the encoder to extract coarse-grained and fine-grained features of the retinal blood vessels. At the jump junction, DS-TransFusion introduces the Transformer interactive fusion attention (TIFA) module. The core of this module is to use a multiscale attention (MA) mechanism to facilitate efficient interaction between multiscale features. It integrates features from two branches at different scales, achieves effective feature fusion, enriches cross-view context modeling and semantic dependency, and captures long-term correlations between data from different image views. This, in turn, enhances segmentation performance. In addition, to integrate multiscale representation in the hierarchical backbone, DS-TransFusion introduces an MA module between the encoder and decoder. This module learns the feature dependencies across different scales, collects the global correspondence of multiscale feature representations, and further optimizes the segmentation effect of the model. The results showed that DS-TransFusion performed impressively on public data sets STARE, CHASEDB1, and DRIVE, with accuracies of 96.50%, 97.22%, and 97.80%, and sensitivities of 84.10%, 84.55%, and 83.17%, respectively. Experimental results show that DS-TransFusion can effectively improve the accuracy of retinal blood vessel segmentation and accurately segment small blood vessels. Overall, DS-TransFusion, as a Swin Transformer-based retinal vessel segmentation model, has achieved remarkable results in solving the problems of unclear segmentation of small vessels and global context connection. Experimental results on several public data sets have validated the effectiveness and superiority of this method, suggesting its potential to provide more accurate retinal vascular segmentation results for auxiliary screening of various diseases.

HTML全文

参考文献(32)

施引文献

资源附件(0)