基于卷积与Transformer融合框架的列车轮对轴承损伤识别方法

邓飞跃; 蔡毓龙; 王锐; 郑守禧

doi:10.13374/j.issn2095-9389.2024.01.02.003

基于卷积与Transformer融合框架的列车轮对轴承损伤识别方法

Train wheelset bearing damage identification method based on convolution and transformer fusion framework

摘要

摘要: 针对传统机器视觉方法在列车轮对轴承损伤检测中存在的图像特征提取不敏感、专家经验要求高以及识别准确率偏低等问题，本文提出了一种基于卷积与Transformer融合框架的列车轮对轴承损伤识别方法. 首先，发展了一种图像增强类别重组的预处理方法，消除不同类别数据样本不均衡的影响，提高图像数据集质量；其次，基于卷积与自注意力融合思想，设计了VGG与Transformer双分支并行融合网络（VGG and Transformer parallel fusion network, VTPF-Net）,综合获取图像全局轮廓特征与局部细节特征信息；再次，构建了多尺度膨胀空间金字塔卷积（Multiscale dilation spatial pyramid convolution, MDSPC）模块，利用多尺度膨胀卷积递进融合充分挖掘特征图中多尺度语义特征；最后，基于NEU-DET图像缺陷数据集与自建列车轮对轴承图像数据集进行了实验分析. 结果表明，所提模型对NEU-DET数据中6类缺陷图像与轮对轴承4类故障图像的识别准确率分别为99.44%与98%，能够较为准确识别不同损伤类型图像样本，在不明显增加模型复杂度基础上各项评价指标要显著优于当前CNN模型、自注意力机制ViT模型以及CNN-Transformer融合模型.

Abstract: To address the issues of image feature insensitivity, high requirement of expert experience, and low recognition accuracy of traditional machine vision methods in train wheelset bearing damage detection, this paper proposes an identification method based on the framework of convolutional and transformer fusion networks for identifying damage to train wheelset bearings. First, due to the complexity of train-bearing images, their category imbalance is more severe; an image preprocessing method called image enhancement category reorganization is used to improve the quality of the acquired image dataset and eliminate the effects of the imbalance dataset. Second, a convolutional neural network (CNN) has high model construction and training efficiency due to adopting a local sensing field and weight-sharing strategy, which can only sense local neighborhoods but has limited ability to capture global feature information. Transformer is a network model based on a self-attention mechanism. With strong parallel computing ability, it can learn the remote dependencies between image pixels in the global scope and has a more powerful global information extraction ability. However, the ability to mine the local features of the image is not sufficient. Therefore, this paper presents a VGG and transformer parallel fusion network that integrates the global contour features and local details of the image based on the fusion of convolution and self-attention. Furthermore, a multiscale dilation spatial pyramid convolution (MDSPC) module is constructed to fully mine the multiscale semantic features in the feature map using multiscale dilation convolution progressive fusion. The proposed method effectively solves the problem of feature information loss due to the mesh effect caused by the expansion convolution. Additionally, embedding coordinate attention (CA) after the MDSPC module can obtain remote dependencies and more precise positional relationships of feature images from two spatial directions, which can more accurately focus on specific regions in the feature map. Finally, experimental analyses were conducted using the NEU-DET image defect and self-constructed train wheelset bearing image datasets. The experimental results demonstrate that the proposed model has an accuracy of 99.44% and 98% for recognizing six types of defects and four types of images of wheelset bearings in NEU-DET data, respectively. The feature extraction capability of the proposed model was verified using model visualization methods. Compared with existing CNN models, ViT model with self-attention mechanism, and CNN-transformer fusion model, the proposed method shows significantly better evaluation metrics and accurately identifies different types of image samples without significantly increasing the model complexity.

HTML全文

参考文献(25)

施引文献

资源附件(0)