Abstract:
Accurate recognition of zinc flotation working conditions can optimize the process of zinc flotation and improve the efficiency. Currently, the recognition of zinc flotation working conditions rely on manual visual observation of froth with experience, which is easily influenced by subjective factors, making it difficult to objectively and accurately recognize flotation working conditions. To solve this problem, utilizing machine vision to investigate the relationship between visual features of zinc flotation froth and working conditions, a sparse attention convolution-ViT model is proposed to recognize the working condition of zinc flotation. Firstly, considering that convolutional neural networks (CNN) which is good at extracting local features, and vision transformer (ViT) which is good at extracting global features, the sparse attention convolution-ViT model combines the structure of CNN and ViT, which is good at extracting both global features such as froth size, texture, color, and global features such as the distribution of froth sizes. Secondly, a sparse multi-head attention mechanism is introduced into ViT to fully process the global feature of froth images. In the sparse multi-head attention mechanism, each attention head processes the global features of the froth image with different levels of sparsity, which can obtain more generalized froth features. Finally, an attention gated unit is proposed to process the features further, allowing for adaptive adjustment of the weight assigned to each feature in the image, which enhances the interpretability of the model, optimizes feature transfer, and captures essential features effectively. Experimental result shows that the sparse attention convolution-ViT model can recognize zinc flotation conditions accurately, and the recognition accuracy on the zinc flotation froth image data set reached 88.62%, which is higher than the traditional CNN and ViT models. Ablation experiments were conducted to verify the necessity of the sparse multi-head attention mechanism and attention gated unit, which improved the recognition accuracy by 0.92% and 2.63% respectively. Grad-CAM (Gradient-weighted Class Activation Mapping) is used to visualize the weights of different features to verify that the proposed sparse attention convolution-ViT model can characterize the froth image with both local features and global features and accurately recognize zinc flotation conditions. The above results show that the proposed sparse attention convolution-ViT model can accurately recognize zinc flotation working conditions and has good application value.