邓飞跃, 蔡毓龙, 王锐, 郑守禧. 基于卷积与Transformer融合框架的列车轮对轴承损伤识别方法[J]. 工程科学学报. DOI: 10.13374/j.issn2095-9389.2024.01.02.003
引用本文: 邓飞跃, 蔡毓龙, 王锐, 郑守禧. 基于卷积与Transformer融合框架的列车轮对轴承损伤识别方法[J]. 工程科学学报. DOI: 10.13374/j.issn2095-9389.2024.01.02.003
Train wheelset bearing damage identification method based on convolution and Transformer fusion framework[J]. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2024.01.02.003
Citation: Train wheelset bearing damage identification method based on convolution and Transformer fusion framework[J]. Chinese Journal of Engineering. DOI: 10.13374/j.issn2095-9389.2024.01.02.003

基于卷积与Transformer融合框架的列车轮对轴承损伤识别方法

Train wheelset bearing damage identification method based on convolution and Transformer fusion framework

  • 摘要: 针对传统机器视觉方法在列车轮对轴承损伤检测中存在的图像特征提取不敏感、专家经验要求高以及识别准确率偏低等问题,本文提出了一种基于卷积与Transformer融合框架的列车轮对轴承损伤识别方法。首先,发展了一种图像增强类别重组的预处理方法,提高列车轮对轴承图像数据集质量,消除不同类别数据样本不均衡的影响;其次,基于卷积与自注意力融合思想,设计了VGG与Transformer双分支并行融合网络(VGG and Transformer parallel fusion network, VTPF-Net),综合获取图像全局轮廓特征与局部细节特征信息;再次,构建了多尺度膨胀空间金字塔卷积(Multiscale dilation spatial pyramid convolution, MDSPC)模块,利用多尺度膨胀卷积递进融合充分挖掘特征图中多尺度语义特征;最后,基于NEU-DET图像缺陷数据集与自建列车轮对轴承图像数据集进行了实验分析。结果表明,所提模型对NEU-DET数据中6类缺陷与轮对轴承4类图像的识别准确率分别为99.44%与98.96%,能够较为准确识别不同损伤类型图像样本,各项评价指标要显著优于当前CNN模型、自注意力机制ViT模型以及CNN-Transformer融合模型。

     

    Abstract: Aiming at the issues of image feature insensitivity, high requirement of expert experience and low recognition accuracy of traditional machine vision methods in train wheelset bearing damage detection, based on the framework of convolutional and transformer fusion networks, this paper proposes a method for identifying damage to train wheelset bearings. First, an image preprocessing method named image enhancement category reorganization is used to improve the quality of the acquired image dataset and to eliminate the effects of imbalance dataset. Second, Convolutional Neural Network (CNN) has high model construction and training efficiency due to the adoption of local sensing field and weight sharing strategy, which can only sense local neighborhoods but has limited ability to capture global feature information.Transformer is a network model based on self-attention mechanism. With strong parallel computing ability, it is able to learn the remote dependencies between image pixels in the global scope, and has more powerful global information extraction ability. Therefore, based on the idea of fusion of convolution and self-attention, VGG and Transformer parallel fusion network (VTPF-Net) is designed in this paper, which integrates the global contour features and local details of the image. Furthermore, the multiscale dilation spatial pyramid convolution (MDSPC) module is constructed to fully mine the multiscale semantic features in the feature map using multiscale dilation convolution progressive fusion. Finally, experimental analyses were carried out based on the NEU-DET image defect dataset and the self-constructed train wheelset bearing image dataset. The experimental results demonstrate that the proposed model has an accuracy of 99.44% and 98.96% for the recognition of 6 types of defects and 4 types of images of wheelset bearings in NEU-DET data, respectively. Compared to existing CNN models, ViT model with self-attention mechanism, and CNN-Transformer fusion model, the proposed method shows significantly better evaluation metrics and accurately identifies different types of image samples.

     

/

返回文章
返回