Abstract:
Safety compliance, particularly the correct usage of personal protective equipment (PPE), is critical in high-risk industrial settings such as cement aggregate production workshops, where traditional manual supervision is often insufficient owing to harsh conditions and operational dynamics. While artificial intelligence-driven video surveillance offers a promising solution, existing object detection models frequently struggle with accurately identifying small and multiscale targets, leading to high error rates and limited practical effectiveness. To address these limitations, this paper introduces ODEM–YOLO, a novel, lightweight yet highly accurate object detection model based on an enhanced YOLOv8 architecture, specifically engineered for robust safety wear detection. Methodologically, ODEM–YOLO incorporates several key innovations. First, the omni-dimensional dynamic convolution (ODConv) module is integrated into the early backbone stages. Unlike standard convolutions with fixed kernels, ODConv employs a multidimensional attention mechanism to dynamically learn kernel weights across spatial, input channel, output channel, and kernel number dimensions, enabling adaptive focus on salient features of small targets in complex scenes and enhancing shallow-level feature map discrimination. Second, the Neck network is optimized with an improved efficient multiscale attention (iEMA) mechanism, centered around an inverted residual mobile block core. This module strategically uses 1 × 1 pointwise convolutions for channel manipulation and 3 × 3 depth-wise separable convolutions for efficient spatial feature learning, allowing effective capture and fusion of multiscale contextual information with significantly reduced computational complexity to improve the representation of diverse PPE sizes. Third, a novel C2f multi-scale edge information enhancement (C2f_MSEIE) module replaces original C2f blocks, explicitly enhancing target edge information for clearer boundary definition. It comprises a local convolution branch for preserving fine-grained details and a multiscale edge modeling branch that utilizes AdaptiveAvgPool2d with multiple bin sizes and an innovative Edge Enhancer submodule to extract and reinforce high-frequency edge features, providing a more robust understanding of object contours for precise localization. The efficacy of ODEM–YOLO was rigorously validated on a custom dataset of
9877 images from actual cement aggregate workshops, featuring diverse small and multiscale targets under realistic and challenging conditions. The experimental results demonstrate ODEM–YOLO’s superior performance, achieving an overall mean average precision (mAP@0.5) of 0.896 and an AP@0.5mask (for the challenging small “mask” objects) of 0.746. Despite these significant accuracy gains, the model maintains a compact size of only 6.9 MB and achieves a rapid single-image processing time of 8.2 ms (utilizing 9.5 GFLOPs), outperforming other mainstream lightweight models such as YOLOv5n and YOLOv10n. Ablation studies systematically confirmed the individual and synergistic contributions of the ODConv, iEMA, and C2f_MSEIE modules to the overall performance improvement. Furthermore, practical deployment on an NVIDIA Jetson Nano B01 embedded device demonstrated ODEM–YOLO’s capability of real-time detection at 25 frames per second, fully satisfying the demanding requirements of industrial on-site safety monitoring. In conclusion, ODEM–YOLO presents a highly effective and efficient solution for real-time safety wear detection in challenging industrial environments. Its architectural innovations specifically target the difficulties of small and multiscale object detection, leading to substantial improvements in accuracy and reliability while preserving a lightweight structure crucial for edge deployment. ODEM–YOLO is a valuable and practical tool for enhancing occupational safety and potentially reducing accident rates.