基于ODEM–YOLO的水泥骨料生产车间工人安全穿戴检测模型

李鑫; 胡慢谷; 佟瑞鹏

doi:10.13374/j.issn2095-9389.2025.03.31.003

基于ODEM–YOLO的水泥骨料生产车间工人安全穿戴检测模型

A safety wear detection model for workers in cement aggregate production workshop based on ODEM–YOLO

摘要

摘要: 针对水泥骨料生产车间监控视频中工人安全穿戴检测中多尺度小目标识别困难、漏检误检频繁以及实时检测效率不高等问题，本文提出了一种基于改进YOLOv8的轻量化目标检测模型ODEM–YOLO(Omni-dimensional efficient attention and multiscale enhancement YOLO). 首先，在YOLOv8基础上引入全维动态卷积(ODConv)模块，增强浅层特征提取能力，有效捕获小目标的关键特征；其次，结合改进的高效多尺度注意力机制(iEMA)优化Neck网络，有效提高多尺度目标的特征表达能力；同时，提出C2f多尺度边缘信息增强(C2f_MSEIE)模块显式增强目标边缘信息，提高对安全装备边界特征的识别精度. 实验基于实际水泥骨料车间监控数据，构建了包含9877个多尺度小目标样本的数据集，开展模型性能评估. 实验结果表明，ODEM–YOLO模型在保持结构轻量化（6.9 MB）的同时，整体检测精度（mAP@0.5）达到0.896，小目标（口罩）检测精度（AP@0.5mask）达到0.746，单图推理时间达8.2 ms，优于YOLOv5n、YOLOv10n等主流模型. 并且在NVIDIA Jetson Nano B01嵌入式设备上实际部署测试达到25 frame·s^–1的实时检测效果，充分满足工业现场实时安全监控需求.

Abstract: Safety compliance, particularly the correct usage of personal protective equipment (PPE), is critical in high-risk industrial settings such as cement aggregate production workshops, where traditional manual supervision is often insufficient owing to harsh conditions and operational dynamics. While artificial intelligence-driven video surveillance offers a promising solution, existing object detection models frequently struggle with accurately identifying small and multiscale targets, leading to high error rates and limited practical effectiveness. To address these limitations, this paper introduces ODEM–YOLO, a novel, lightweight yet highly accurate object detection model based on an enhanced YOLOv8 architecture, specifically engineered for robust safety wear detection. Methodologically, ODEM–YOLO incorporates several key innovations. First, the omni-dimensional dynamic convolution (ODConv) module is integrated into the early backbone stages. Unlike standard convolutions with fixed kernels, ODConv employs a multidimensional attention mechanism to dynamically learn kernel weights across spatial, input channel, output channel, and kernel number dimensions, enabling adaptive focus on salient features of small targets in complex scenes and enhancing shallow-level feature map discrimination. Second, the Neck network is optimized with an improved efficient multiscale attention (iEMA) mechanism, centered around an inverted residual mobile block core. This module strategically uses 1 × 1 pointwise convolutions for channel manipulation and 3 × 3 depth-wise separable convolutions for efficient spatial feature learning, allowing effective capture and fusion of multiscale contextual information with significantly reduced computational complexity to improve the representation of diverse PPE sizes. Third, a novel C2f multi-scale edge information enhancement (C2f_MSEIE) module replaces original C2f blocks, explicitly enhancing target edge information for clearer boundary definition. It comprises a local convolution branch for preserving fine-grained details and a multiscale edge modeling branch that utilizes AdaptiveAvgPool2d with multiple bin sizes and an innovative Edge Enhancer submodule to extract and reinforce high-frequency edge features, providing a more robust understanding of object contours for precise localization. The efficacy of ODEM–YOLO was rigorously validated on a custom dataset of 9877 images from actual cement aggregate workshops, featuring diverse small and multiscale targets under realistic and challenging conditions. The experimental results demonstrate ODEM–YOLO’s superior performance, achieving an overall mean average precision (mAP@0.5) of 0.896 and an AP@0.5mask (for the challenging small “mask” objects) of 0.746. Despite these significant accuracy gains, the model maintains a compact size of only 6.9 MB and achieves a rapid single-image processing time of 8.2 ms (utilizing 9.5 GFLOPs), outperforming other mainstream lightweight models such as YOLOv5n and YOLOv10n. Ablation studies systematically confirmed the individual and synergistic contributions of the ODConv, iEMA, and C2f_MSEIE modules to the overall performance improvement. Furthermore, practical deployment on an NVIDIA Jetson Nano B01 embedded device demonstrated ODEM–YOLO’s capability of real-time detection at 25 frames per second, fully satisfying the demanding requirements of industrial on-site safety monitoring. In conclusion, ODEM–YOLO presents a highly effective and efficient solution for real-time safety wear detection in challenging industrial environments. Its architectural innovations specifically target the difficulties of small and multiscale object detection, leading to substantial improvements in accuracy and reliability while preserving a lightweight structure crucial for edge deployment. ODEM–YOLO is a valuable and practical tool for enhancing occupational safety and potentially reducing accident rates.

HTML全文

参考文献(29)

施引文献

资源附件(0)