Abstract:
Rotated object detection (ROD) is a critical subtask in computer vision, particularly in real-world applications such as aerial remote sensing and industrial character detection, where objects frequently appear in arbitrary orientations with diverse aspect ratios. Unlike standard object detection, which assumes axis-aligned bounding boxes, ROD requires precise estimation of both object location and orientation. Conventional rotation regression methods suffer from angle periodicity and discontinuity, resulting in unstable training and inaccurate predictions. In addition, densely packed scenes with complex backgrounds make positive and negative sample assignment highly sensitive, often leading to suboptimal convergence. To address these challenges, this study proposes a rotation-aware object detection approach based on YOLOv8, enhanced through two key components: a shape-aware adaptive angle classification strategy and a progressive dynamic matching mechanism. The angle classification strategy replaces traditional continuous angle regression with discrete angle classification. Angle annotations are transformed into soft label vectors using a circular Gaussian window function to preserve angle periodicity. A novel feature of this design is the incorporation of target shape information, where the smoothing parameter of the label distribution is adaptively adjusted based on the object’s aspect ratio. Specifically, for elongated targets such as ships or text lines, a narrow window enforces sharp classification around the true angle, enabling fine-grained orientation discrimination. Conversely, for square-like or low-aspect-ratio objects, a wider window accommodates angular ambiguity and stabilizes training across diverse target geometries. This shape-aware mechanism mitigates angular discontinuities and enhances classification accuracy in multi-oriented detection tasks. To complement the angle classification, a progressive dynamic sample matching mechanism is introduced to improve the quality of positive sample selection during training. Instead of relying solely on rotated IoU (rIoU)—which is unreliable in early training when angle predictions are inaccurate—the method begins with horizontal IoU (hIoU) and gradually incorporates rIoU through linear interpolation as training proceeds. The final matching score integrates three components: classification confidence, IoU-based localization quality, and a cosine-based angle-consistency term. This unified metric guides the selection of top-K positive samples for each ground truth object, emphasizing high-quality matches while suppressing low-quality or ambiguous ones. This progressive transition improves training stability, accelerates convergence, and enhances rotation alignment between predictions and ground truth. Extensive experiments are conducted on two datasets. On the DOTA dataset, which includes multiple object classes with diverse orientations and aspect ratios, the proposed method achieves a mean Average Precision (mAP) of 0.786, with notable improvements in high-aspect-ratio categories such as ships, vehicles, and containers. On a custom industrial character dataset consisting of densely arranged, multi-oriented alphanumeric components captured under complex conditions, the method achieves a mAP of 0.924, demonstrating strong generalization to scene-text-like tasks. Ablation studies isolate the contribution of each component: the shape-aware classification yields a 4.3% improvement in angle-sensitive categories, while the dynamic matching strategy produces smoother loss curves and more concentrated attention heatmaps. The method preserves the anchor-free structure and real-time inference capability of YOLOv8 while substantially improving performance in rotation-sensitive contexts. All modifications are lightweight and easily integrable into existing pipelines without structural changes to the backbone.