DENG Feiyue, CAI Yulong, WANG Rui, ZHENG Shouxi. Train wheelset bearing damage identification method based on convolution and transformer fusion framework[J]. Chinese Journal of Engineering, 2024, 46(10): 1834-1844. DOI: 10.13374/j.issn2095-9389.2024.01.02.003
Citation: DENG Feiyue, CAI Yulong, WANG Rui, ZHENG Shouxi. Train wheelset bearing damage identification method based on convolution and transformer fusion framework[J]. Chinese Journal of Engineering, 2024, 46(10): 1834-1844. DOI: 10.13374/j.issn2095-9389.2024.01.02.003

Train wheelset bearing damage identification method based on convolution and transformer fusion framework

  • To address the issues of image feature insensitivity, high requirement of expert experience, and low recognition accuracy of traditional machine vision methods in train wheelset bearing damage detection, this paper proposes an identification method based on the framework of convolutional and transformer fusion networks for identifying damage to train wheelset bearings. First, due to the complexity of train-bearing images, their category imbalance is more severe; an image preprocessing method called image enhancement category reorganization is used to improve the quality of the acquired image dataset and eliminate the effects of the imbalance dataset. Second, a convolutional neural network (CNN) has high model construction and training efficiency due to adopting a local sensing field and weight-sharing strategy, which can only sense local neighborhoods but has limited ability to capture global feature information. Transformer is a network model based on a self-attention mechanism. With strong parallel computing ability, it can learn the remote dependencies between image pixels in the global scope and has a more powerful global information extraction ability. However, the ability to mine the local features of the image is not sufficient. Therefore, this paper presents a VGG and transformer parallel fusion network that integrates the global contour features and local details of the image based on the fusion of convolution and self-attention. Furthermore, a multiscale dilation spatial pyramid convolution (MDSPC) module is constructed to fully mine the multiscale semantic features in the feature map using multiscale dilation convolution progressive fusion. The proposed method effectively solves the problem of feature information loss due to the mesh effect caused by the expansion convolution. Additionally, embedding coordinate attention (CA) after the MDSPC module can obtain remote dependencies and more precise positional relationships of feature images from two spatial directions, which can more accurately focus on specific regions in the feature map. Finally, experimental analyses were conducted using the NEU-DET image defect and self-constructed train wheelset bearing image datasets. The experimental results demonstrate that the proposed model has an accuracy of 99.44% and 98% for recognizing six types of defects and four types of images of wheelset bearings in NEU-DET data, respectively. The feature extraction capability of the proposed model was verified using model visualization methods. Compared with existing CNN models, ViT model with self-attention mechanism, and CNN-transformer fusion model, the proposed method shows significantly better evaluation metrics and accurately identifies different types of image samples without significantly increasing the model complexity.
  • loading

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return