基于稀疏注意力卷积ViT模型的锌浮选工况识别

Sparse attention convolution-ViT model for working condition recognition in zinc flotation

  • 摘要: 准确识别锌浮选工况并用于指导锌浮选操作,可以提高浮选效率、优化选矿过程. 目前浮选现场主要通过人工肉眼观察泡沫并依据经验判断工况,这种方法主观性强,难以客观准确地评价锌浮选工况. 针对该问题,本文通过研究锌浮选泡沫视觉特征和浮选工况的密切联系,提出基于稀疏注意力卷积ViT模型的锌浮选工况识别方法. 首先,所提模型融合了卷积神经网络(Convolutional neural networks, CNN)和视觉Transformer(Vision transformer,ViT)的结构和优点,同时感知泡沫局部空间信息和全局信息,完备表征泡沫图像. 其次,模型引入稀疏的多头注意力机制,每个注意力头以不同的稀疏程度处理特征,从不同尺度下感知全局信息,同时引入注意力门控单元优化特征传递,最终实现锌浮选工况识别. 实验结果表明,本文所提工况识别方法在锌浮选泡沫图像数据集上的准确率达到88.62%,解决了传统CNN和ViT模型不能充分利用泡沫图像全局信息,且无法自适应捕捉泡沫图像重要特征的问题,为浮选流程优化提供有力支持.

     

    Abstract: Accurate recognition of working conditions can optimize the zinc flotation process and improve its efficiency. Traditionally, this recognition heavily relies on manual observations of froth appearance, a method prone to human error and subjective judgment. To address this issue and improve recognition accuracy, a sparse attention convolution-ViT model is proposed. This model leverages machine vision techniques to investigate the relationship between froth visual features and the working conditions using real-time froth images from industrial sites. The model aims to recognize zinc flotation working conditions in real time, thereby providing guidance for operations. First, it combines the strengths of convolutional neural networks (CNNs) and vision transformers (ViT) to effectively extract both local and global features from froth images. Specifically, CNNs are adept at capturing local features, such as texture, color, and fine details of the froth, while ViT excels at identifying global features, such as the froth size distribution. By combining these two architectures, the sparse attention convolution-ViT model comprehensively analyzes the froth images. To enhance the global feature processing of froth images, a sparse multi-head attention mechanism is introduced into the ViT component. This mechanism allows the model to process global features with different sparsity levels, reducing computational costs and improving the model’s adaptability to different froth appearances. Each attention head in the sparse multi-head attention mechanism targets different aspects of global features, allowing the model to extract various information from the froth images while maintaining efficiency. Furthermore, an attention gated unit is introduced to refine the feature processing. This unit allows adaptive weighting of extracted features in the image, enhancing model interpretability and optimizing feature transfer. By effectively capturing the relevant features, the attention-gated unit helps the model to focus on critical features of the froth images that can indicate the working conditions. Experimental results demonstrated the effectiveness of the proposed sparse attention convolution-ViT model in recognizing zinc flotation working conditions. The model achieved a recognition accuracy of 88.62% on the zinc flotation froth image dataset, surpassing traditional CNN and ViT models. Ablation experiments highlighted the critical role of the sparse multi-head attention mechanism and the attention-gated unit, contributing to accuracy improvements of 0.92% and 2.63%, respectively. Moreover, gradient-weighted class activation mapping was used to visualize feature weights, confirming the model’s capability to effectively characterize froth images by identifying both local and global features. This accurate recognition of zinc flotation conditions underscores the potential of the model in providing reliable real-time recognition, supporting the optimization of the flotation process, thereby improving efficiency and resource utilization in zinc flotation.

     

/

返回文章
返回