基于ODE-YOLO的水泥骨料生产车间工人安全穿戴检测模型

李鑫; 胡慢谷; 佟瑞鹏

doi:10.13374/j.issn2095-9389.2025.03.31.003

摘要: 水泥骨料生产车间环境恶劣，存在重尘、碎石飞溅等风险，工人作业时须穿戴安全帽、口罩、反光衣等安全装备。为保障作业规范和工人人身安全，开展安全穿戴检测已成为业内共识。AI视频分析是工人安全穿戴检测的有效手段之一，但面临着多尺度和小目标的挑战，现有检测算法仍存在检测精度低、漏检率高和实时性差等问题。为此，本文提出了一种基于改进YOLOv8的小目标多尺度检测算法ODE-YOLO。首先，在YOLOv8基线模型基础上，引入了ODConv模块以增强网络的特征提取能力，并采用EMA注意力机制来提升多尺度目标的表征能力。其次，采用iRMB架构优化重构了EMA，以解决EMA带来的后处理效率低的问题，提出了iEMA注意力方法，实现了效率和性能的平衡。最后，基于某矿山水泥骨料生产车间现场采集不同时段和不同机位的视频图像，制备了9877个多尺度小目标样本数据集，开展了ODE-YOLO算法的性能评估实验。不同注意力机制实验结果表明EMA算法对多尺度目标特征提取和表征能力提升方面效果最优，消融实验和对比实验结果表明ODE-Y1OLO有效增强了多尺度小目标的检测精度，且在较小参数规模和计算量的情况下，mAP@0.5达到了0.868，小目标识别精度AP@0.5mask达到了0.722，兼具推理速度快和后处理时延低等特点，可实时准确地实现对水泥骨料生产车间工人安全穿戴检测。

Abstract: In hazardous environments such as cement aggregate production plants, workers are required to wear safety equipment including helmets, masks, and reflective vests to mitigate the risks of heavy dust and flying debris. However, non-compliance with safety gear requirements remains prevalent, contributing to frequent workplace accidents. Manual supervision proves inefficient due to environmental limitations. As a result, the deployment of AI-based video analysis for real-time safety wear detection has become increasingly vital. Yet, this task presents significant challenges, particularly due to the presence of small objects and multi-scale targets in complex scenes, which compromise detection accuracy, increase false negative rates, and hinder real-time performance. To address these issues, this study proposes a novel multi-scale small object detection algorithm, ODE-YOLO, built upon the YOLOv8 architecture. The core innovation lies in integrating the Omni-Dimensional Dynamic Convolution (ODConv) module into the shallow layers of the backbone to enhance feature extraction for small objects, and embedding an improved attention mechanism, iEMA (inverted Efficient Multi-scale Attention), within the neck network to strengthen multi-scale feature representation while preserving real-time inference performance. The EMA module, known for its multi-scale parallel structure and spatial attention capabilities, was modified using an inverted residual mobile block (iRMB) to form iEMA. This structure balances efficiency and accuracy by reusing features, reducing computation, and eliminating the need for complex matrix operations found in traditional self-attention mechanisms. The combination of ODConv and iEMA allows the model to better capture contextual cues across varying object scales, especially for hard-to-detect categories like masks and unhelmeted heads. A customized dataset comprising 9,877 labeled instances was created using surveillance footage from multiple workstations in a cement plant, covering various time periods and camera angles. This dataset included six categories: vest, no-vest, helmet, head, mask, and no-mask. Statistical analysis revealed a strong presence of small and scale-diverse targets, with some classes occupying less than 0.5% of the image area. Training was conducted using PyTorch 2.0.0 on an NVIDIA RTX 3090 GPU. A comprehensive series of experiments was carried out, including attention mechanism comparisons, ablation studies, and benchmarking against state-of-the-art models such as YOLOv5n, YOLOv10n, Faster R-CNN, Mask R-CNN, and RT-DETR-L. The results demonstrate that the proposed ODE-YOLO outperforms other YOLO variants and R-CNN models in terms of mean average precision (mAP@0.5 = 0.868) and small object detection precision (AP@0.5mask = 0.722), while maintaining a lightweight architecture (11.3 MB) and fast inference (2.2 ms/image). The iEMA attention mechanism outperformed other mainstream attention modules (SE, CBAM, CA), particularly in improving the precision of mask detection by 28.5% compared to the baseline. Ablation experiments confirmed the individual and combined contributions of ODConv and iEMA to both accuracy and speed, evidencing their synergistic effect. Visual inspection using real-world test images showed that ODE-YOLO achieved balanced detection across object scales without missed detections or misclassifications, making it highly suitable for real-time deployment in production environments. In conclusion, this study introduces a robust and efficient algorithm tailored for safety wear detection in industrial scenarios characterized by multi-scale and small object challenges. ODE-YOLO provides a practical tool for enhancing workplace safety supervision, offering timely alerts for non-compliance, and supporting safety management personnel in mitigating risks and preventing accidents.

基于ODE-YOLO的水泥骨料生产车间工人安全穿戴检测模型

LI Xin1), HU Mangu1), TONG Ruipeng1)