基于多注意力的改进YOLOv5s小目标检测算法

马鸽; 李洪伟; 严梓维; 刘志杰; 赵志甲

doi:10.13374/j.issn2095-9389.2024.01.18.003

摘要: 交通标志识别应用中待检测目标多为小目标，因其携带信息少、定位精度要求高、易被环境噪声淹没等特点成为当前交通标志检测的难点. 针对小目标交通标志漏检、误检、检测准确率低等问题，本文设计了一种用于小目标检测的STD-YOLOv5s (Small target detection YOLOv5s )模型. 首先，通过增加上采样和Prediction输出层数获得了更丰富的位置信息，解决了YOLOv5s模型在处理小目标时信息不足的问题，增强了对图像的全局理解能力；其次，在每个C3模块之后添加CA(Coordinate attention)注意力机制并在每个输出层前添加Swin-T注意力机制模块，增加了网络对多层特征信息的捕捉，提高了小目标的检测性能；最后，充分利用SIoU惩罚函数同时考虑目标形状、空间关系的特点，更好地捕捉不同尺寸的目标在图像中的位置关系，提高目标位置的精确性. 所提模型在TT100K数据集上进行了验证实验，实验结果表明本文方法不仅保持了YOLOv5s模型的轻量性和快速性，在精确率、召回率和平均精度三个指标上也有所提升，提高了小目标检测的精确性.

Abstract: Traffic sign detection and recognition facilitates real-time monitoring and interpretation of various traffic signs on the road, such as those indicating speed limits, prohibition of overtaking, and navigation cues. This has substantial applications for autonomous driving and decision-making systems. Consequently, designing accurate and efficient algorithms for the automatic recognition of traffic signs is crucial in the intelligent transportation field. However, targets that need to be detected by traffic sign recognition applications are mostly small-sized, causing challenges regarding their automatic recognition. The YOLOv5s model, characterized by its minimal depth and narrowest feature map, has gained widespread popularity for executing detection owing to its features of being lightweight and easily portable. Furthermore, the YOLOv5s model uses an anchor-based prediction approach that uses anchor boxes of different sizes and shapes to regress and classify various targets. This method generates dense anchor boxes and enables the model to directly perform object classification and bounding box regression, thereby enhancing its target recall capability. Therefore, the anchor-based Yolov5s method has been applied to traffic sign detection; however, it suffers from issues such as false positives and missed detection. Detection of small targets continues to be a challenging aspect in current traffic sign recognition technology due to the following: small targets carry less information; detection of small targets requires high precision in positioning; and environmental noise may overwhelm the detection of small targets. To overcome the abovementioned issues, such as missed detection, false positives, and low detection accuracy, this study proposes a model called STD-YOLOv5s that is specifically designed for small target detection. First, by increasing the number of upsampling and prediction output layers, this model obtains abundant location information. This can enhance the global understanding of images and solve the issue of insufficient information associated with small targets. Second, the CA attention mechanism is added after each C3 module, whereas the Swin-T attention mechanism module is added before each output layer, increasing the model’s ability to capture multilayer feature information and consequently improving its performance of small target detection. Finally, the accuracy of target localization is ensured using the SIoU penalty function, which considers the target shape and spatial relationships, thereby increasing the model’s ability to capture the positional relationships among targets of different sizes in the image. The STD-YOLOv5s model was validated using the TT100K dataset by ablation and comparison experiments. Experimental results indicate that the proposed model not only maintains the lightweight nature and high detection speed of the YOLOv5s model but also achieves improvements in precision, recall, and average precision.

基于多注意力的改进YOLOv5s小目标检测算法

Improved small target detection algorithm based on multiattention and YOLOv5s for traffic sign recognition