Abstract:
As pavements age, sustained traffic, environmental degradation, and material aging lead to increasing occurrences of cracks, potholes, and other distresses, threatening safety and infrastructure longevity. Textured road networks, combined with varying illumination, narrow geometries, and low-contrast cracks, make reliable detection challenging, leading to missed detections and false positives. Although deep learning has advanced automated pavement assessment, common lightweight single-stage detectors struggle to balance accuracy and speed on edge devices such as vehicle-mounted systems and unmanned aerial vehicles (UAVs). This is due to the fact that repeated downsampling, fixed interpolation, and coarse cross-scale fusion blur crack boundaries and weaken texture cues, thereby reducing recall and degrading localization. To address these challenges, this study introduces YOLO-RDD, a lightweight, task-specific pavement distress detection algorithm derived from YOLOv11n architecture. Designed explicitly for the intrinsic characteristics of pavement distress, including small spatial extent, high aspect ratio, low-intensity contrast relative to the surrounding pavement, and susceptibility to masking by heterogeneous background textures, YOLO-RDD achieves a principled balance between detection accuracy and real-time deployability on edge platforms. This method systematically strengthens the feature fidelity, cross-scale alignment, and context-aware prediction
via targeted architectural refinement across the backbone, neck, and detection head. First, in the backbone network, the original C3k2 modules are replaced with a structurally reparametrized RepGELAN module. RepGELAN integrates progressive intralayer feature aggregation with cross-stage information feedback, thereby expanding the effective receptive fields while preserving high-frequency edge responses and weak texture discriminability, which are critical for detecting narrow, low-contrast cracks. Structural reparameterization enables the equivalent conversion of its multibranch training-time architecture into a single standard convolution at inference time, ensuring minimal computational overhead without sacrificing representational expressiveness. Second, for the neck network, we propose DySlim-Neck, a lightweight, semantic-aware fusion architecture that synergistically combines a dynamic upsampling unit (DSU) with depthwise- separable convolutions. Before cross-layer fusion, the DSU performs content-adaptive alignment, explicitly correcting geometric misalignment and suppressing over-smoothing artifacts commonly induced by fixed-resolution interpolation and naïve concatenation. Coupled with mixed convolutional aggregation, DySlim-Neck significantly reduces the fusion-induced latency and memory footprint while maintaining high-fidelity propagation of small-object features across scales, thereby safeguarding fine crack morphology and continuity. Third, the detection head adopts DynamicHead, which dynamically reweighs and reorganizes multiscale features along scale, spatial, and channel dimensions based on target-specific semantic cues. This adaptive coupling explicitly addresses the classification–regression decoupling problem prevalent in small, weakly contrasted targets, enhancing localization confidence and scale-invariant prediction robustness under heterogeneous pavement backgrounds. Furthermore, an embedded dynamic activation mechanism selectively suppresses background clutter responses while amplifying discriminative crack signatures, yielding higher precision in ambiguous regions. The proposed algorithm is rigorously evaluated on the RDD2022 benchmark dataset using four complementary metrics: F1-score (harmonic mean of precision and recall), mean Average Precision (mAP) at IoU thresholds of 0.5 and 0.5:0.95, parameter count (in millions), and computational cost measured as GFLOPs and inference latency (ms) on an embedded Jetson AGX Orin platform. Experimental results demonstrate that, relative to the YOLOv11n baseline, YOLO–RDD achieves a 19.7% reduction in parameter count while delivering consistent gains across detection robustness metrics—improving F1-score by 1.0 percentage point and mAP@0.5 by 2.3 percentage points—without compromising mAP@0.5:0.95. indicating enhanced localization consistency under varying overlap criteria.. Notably, ablation-guided analysis confirms that these improvements are particularly pronounced for fine-scale distress: the detection recall for narrow cracks (< 2 pixels wide) increases by 8.6%, and the pothole localization accuracy (measured by the bounding-box IoU) improves by 5.4%, especially under low-illumination conditions and highly textured asphalt surfaces. YOLO–RDD excels in pavement defect detection, particularly in narrow cracks. However, the stability decreases under real-world challenges such as uneven lighting, dirt, varying camera angles, and limited training diversity. The next step involves: (1) expanding real-world road data across times, road types, and capture conditions, as well as augmenting the generative crack/damage samples to boost generalization; and (2) adopting transfer and continual learning for cross-region adaptation and fusing multi-source data into a pavement health assessment framework to accelerate engineering deployment.