Accurating retinal vessel segmentation is of great significance in assisting the screening of various diseases. However, existing methods still face challenges such as the inability to establish long-range dependencies and global contextual connections, as well as low contrast between retinal vessel branch terminals and the background. To address these issues, this paper proposes a new retinal vessel segmentation framework called Dual Swin Transformer Fusion (DS-TransFusion). Firstly, a dual-scale encoder sub-network based on the Swin Transformer is adopted to extract coarse-grained and fine-grained features at different semantic scales. Secondly, a Transformer Interaction Fusion Attention (TIFA) module is introduced at the skip connections to enhance cross-view context modeling and semantic dependencies, solving the key problem of capturing long-term correlations between data from different image views. Finally, a Multi-Scale Attention (MSA) is proposed between the encoding and decoding regions to gather global correspondences of multi-scale feature representations. The proposed method is evaluated on the public datasets STARE, CHASEDB1, and DRIVE. In the experiments, the area under the receiver operating characteristic curve (ROC) and accuracy (Acc) achieved excellent performances of 98.3%/96.8%, 98.5%/97.2%, and 98.3%/96.5% respectively. The experimental results demonstrate significant improvements in the accuracy, sensitivity, and specificity of retinal segmentation, outperforming existing state-of-the-art methods.