基于自适应角度分类与动态样本匹配的旋转目标检测方法

Rotated object detection using adaptive angle classification and dynamic sample matching

  • 摘要: 旋转目标检测旨在精准识别任意方向分布的目标,常用于遥感、工业字符识别等复杂场景. 针对角度回归不连续与样本匹配不稳定等挑战,本文提出了一种基于角度分类的新型检测框,该方法在YOLOv8基础上进行了两方面关键改进:设计了结合目标几何形状的自适应角度平滑标签(SA-ASL),将角度预测由回归问题转化为自适应的标签分类问题,提升角度预测精度与稳定性;另外引入了渐进式的动态正负样本匹配机制,融合水平与旋转IoU,增强模型训练过程中的正样本选择质量. 本方法在公开的DOTA数据集上的mAP值达到0.786,在工业字符数据集上的mAP值达到了0.924,显示出良好的泛化能力与鲁棒性,证明其在旋转目标检测任务中的实用价值.

     

    Abstract: Rotated object detection (ROD) is a critical subtask in computer vision, particularly in real-world applications such as aerial remote sensing and industrial character detection, where objects frequently appear in arbitrary orientations with diverse aspect ratios. Unlike standard object detection, which assumes axis-aligned bounding boxes, ROD requires precise estimation of both object location and orientation. Conventional rotation regression methods suffer from angle periodicity and discontinuity, resulting in unstable training and inaccurate predictions. In addition, densely packed scenes with complex backgrounds make positive and negative sample assignment highly sensitive, often leading to suboptimal convergence. To address these challenges, this study proposes a rotation-aware object detection approach based on YOLOv8, enhanced through two key components: a shape-aware adaptive angle classification strategy and a progressive dynamic matching mechanism. The angle classification strategy replaces traditional continuous angle regression with discrete angle classification. Angle annotations are transformed into soft label vectors using a circular Gaussian window function to preserve angle periodicity. A novel feature of this design is the incorporation of target shape information, where the smoothing parameter of the label distribution is adaptively adjusted based on the object’s aspect ratio. Specifically, for elongated targets such as ships or text lines, a narrow window enforces sharp classification around the true angle, enabling fine-grained orientation discrimination. Conversely, for square-like or low-aspect-ratio objects, a wider window accommodates angular ambiguity and stabilizes training across diverse target geometries. This shape-aware mechanism mitigates angular discontinuities and enhances classification accuracy in multi-oriented detection tasks. To complement the angle classification, a progressive dynamic sample matching mechanism is introduced to improve the quality of positive sample selection during training. Instead of relying solely on rotated IoU (rIoU)—which is unreliable in early training when angle predictions are inaccurate—the method begins with horizontal IoU (hIoU) and gradually incorporates rIoU through linear interpolation as training proceeds. The final matching score integrates three components: classification confidence, IoU-based localization quality, and a cosine-based angle-consistency term. This unified metric guides the selection of top-K positive samples for each ground truth object, emphasizing high-quality matches while suppressing low-quality or ambiguous ones. This progressive transition improves training stability, accelerates convergence, and enhances rotation alignment between predictions and ground truth. Extensive experiments are conducted on two datasets. On the DOTA dataset, which includes multiple object classes with diverse orientations and aspect ratios, the proposed method achieves a mean Average Precision (mAP) of 0.786, with notable improvements in high-aspect-ratio categories such as ships, vehicles, and containers. On a custom industrial character dataset consisting of densely arranged, multi-oriented alphanumeric components captured under complex conditions, the method achieves a mAP of 0.924, demonstrating strong generalization to scene-text-like tasks. Ablation studies isolate the contribution of each component: the shape-aware classification yields a 4.3% improvement in angle-sensitive categories, while the dynamic matching strategy produces smoother loss curves and more concentrated attention heatmaps. The method preserves the anchor-free structure and real-time inference capability of YOLOv8 while substantially improving performance in rotation-sensitive contexts. All modifications are lightweight and easily integrable into existing pipelines without structural changes to the backbone.

     

/

返回文章
返回