Abstract:
In the realm of autonomous driving, the design and implementation of collaborative multi-task perception algorithms pose a significant set of challenges. These challenges are primarily rooted in the need for real-time processing speeds, effective feature sharing among diverse tasks, and seamless information fusion. Addressing these concerns is critical for enhancing the overall safety and efficiency of autonomous systems navigating complex traffic environments.To tackle these issues, we introduce MTEPN, an innovative deep convolutional neural network algorithm specifically designed to perform multiple visual tasks concurrently. This framework aims to achieve three essential objectives: vehicle target detection, extraction of drivable road areas, and segmentation of lane lines. By integrating these tasks into a unified model, MTEPN enhances the perceptual capabilities of autonomous driving systems and improves their ability to operate effectively in real-world settings.The foundation of MTEPN is built upon the CSPDarkNet network, which is employed to extract fundamental features from traffic scene images. By leveraging a horizontal connection mechanism, this network enhances the model's feature extraction capabilities, establishing a robust basis for subsequent multi-task processing. This initial step is crucial, as high-quality feature extraction determines the overall performance of the entire system.Following feature extraction, this article introduce a multi-channel deformable feature aggregation module, termed C2f-K. This module is designed to capture fine-grained global image features by facilitating cross-layer information fusion. By integrating features across different scales, C2f-K effectively reduces background noise and interference, thereby improving the model’s understanding of complex scenes.To further enhance the efficiency and accuracy of the model, this article propose an orthogonal attention mechanism known as HWAttention. This mechanism minimizes computational load while amplifying significant spatial features within the input images. By selectively focusing on critical areas of interest, HWAttention significantly boosts the model's performance across various environments, ensuring that it remains efficient even under real-time constraints.A notable advancement introduced in MTEPN is the cross-task information aggregation module, named CFAS. This module promotes information complementarity between tasks by implicitly modeling the global context relationships among different visual tasks. The integration of complementary pattern information serves to deepen feature sharing, thereby enhancing the recognition accuracy of each individual task. This approach fosters a synergistic relationship among the tasks, enabling the model to operate more effectively than traditional methods that treat tasks in isolation.Additionally, the decoupled task head module allows for independent processing of the three perceptual objectives. This design choice not only increases the model's flexibility but also sharpens the focus on each task, allowing for tailored optimization strategies that enhance overall performance.Experimental evaluations conducted on the BDD100k public dataset indicate that MTEPN achieves impressive results, with an average mean Average Precision (mAP) of 79.4% for vehicle target detection and an average Intersection-over-Union (mIoU) of 92.4% for pixels extracted from the drivable area. Both metrics surpass those of existing mainstream multi-task perception algorithms with comparable parameter scales. Furthermore, the lane line segmentation accuracy, measured by IoU, reaches a sub-optimal value of 27.2%,.Importantly, MTEPN maintains a modest parameter count of only 7.9 million and processes single-frame images in just 24.3 milliseconds. This efficiency demonstrates its suitability for real-time applications in autonomous driving, where both speed and accuracy are paramount. The relevant code for this innovative algorithm will be made publicly available at https://github.com/XMUT-Vsion-Lab.