Abstract:
Wire bonding, which is a critical step in integrated circuit packaging, interconnects various components and chips to ensure appropriate circuit functionality. Quality inspection directly affects the product yield rates. To address the issues of low detection accuracy and efficiency in existing bond wire defect detection methods, particularly for dense, microscale, and geometrically irregular defects, this study proposes a novel defect detection model, Depth-YOLO. The proposed framework integrates multi-modal depth features and hierarchical attention mechanisms to overcome limitations of conventional RGB-based approaches in complex industrial environments. First, the model reconstructs the input terminal of YOLOv8 architecture to process 4-channel pseudo-RGBD data, combining single-channel depth maps with three-channel normal maps derived from gradient-based geometric mapping. This enables the model to capture texture and 3D (three dimensional) spatial features that are critical for detecting defects such as wire curvature anomalies and bridge faults. Second, an input feature enhancement module (Enhance) is designed to hierarchically extract the depth and geometric information. The Enhance module employs multi-scale convolution (3×3, 5×5, and 7×7 kernels) for depth feature amplification, Sobel operators for surface gradient extraction, and dual-attention fusion (channel-spatial attention) to weight critical regions, improving depth-aware feature representation by 2.8% when compared to baseline. To optimize computational efficiency, the original C2f module in YOLOv8’s backbone is replaced with a lightweight C2f_Faster module. This modification introduces partial convolution (Partial_conv3), which processes only 25% of the input channels coupled with DropPath regularization to mitigate overfitting. The experimental results show a 10% reduction in GFLOPs while maintaining 89.8% of baseline accuracy. Furthermore, a multidimensional feature attention (MDFA) mechanism is proposed to address the diverse defect morphologies. By synergistically integrating channel-aware feature mixing (CAFM) for global dependency modeling, multi-level context attention (MLCA) for dynamic receptive field adjustment, and cross-phase context aggregation (CPCA) with asymmetric convolutions (e.g., 1×7, 7×1 kernels), MDFA achieves a 4% recall improvement on irregular defects when compared to single-attention baselines. The original CIoU loss function is replaced with Wise-IoU (WIoU) to enhance bounding box regression stability. WIoU dynamically weighs training samples based on annotation quality and reduces the gradient dominance from low-quality examples. such as RT-DETR-L (98.9% mAP@0.5 with 1.1× higher FLOPs). Ablation studies confirm the necessity of multimodality fusion: using RGB-only inputs degrades mAP@0.5 by 14.7%, whereas disabling the MDFA reduces the recall on irregular defects by 18.4%. Practical deployment tests on NVIDIA Jetson AGX Xavier show real-time inference at 18 FPS with 1.2-GB memory usage, meeting industrial throughput requirements. This methodology not only enables high-precision automated inspection of semiconductor bond wires but also provides a scalable framework for defect detection in other integrated circuit manufacturing stages, such as solder joint inspection and wafer surface analysis.