Enhanced YOLOv8 with BiFPN and MHSA for traffic vehicle detection
Tóm tắt: 344
|
PDF: 121
##plugins.themes.academic_pro.article.main##
Author
-
Ngo Truong AnMaster’s Student, Computer Science, The University of Danang - University of Science and Technology, Vietnam; Dong A University, VietnamHuynh Huu HungThe University of Danang - University of Science and Technology, VietnamTran Thi Hoang OanhDong A University, Vietnam
Từ khóa:
Tóm tắt
Accurately detecting vehicles in urban traffic scenarios is a complex task, especially when dealing with cluttered backgrounds, diverse object scales, and high vehicle density. In this study, we propose an improved YOLOv8-based model tailored for vehicle detection in such challenging environments. The enhancement lies in the integration of a Bidirectional Feature Pyramid Network (BiFPN), which boosts multi-scale feature fusion, and a Multi-Head Self-Attention (MHSA) module, designed to strengthen the model’s capacity to understand broader spatial context. Together, these components help the model better distinguish between densely arranged vehicles. We conducted in-depth experiments on the Vehicles-COCO dataset, and the results demonstrate that our YOLOv8-BiFPN-MHSA variant outperforms the original YOLOv8 not only in Precision but also in mAP. Our model achieves significantly higher mAP@0.5 and mAP@0.5:0.95, along with an overall improvement in detection performance. These enhancements highlight the stability, efficiency, and strong potential for real-world traffic monitoring systems.
Tài liệu tham khảo
-
[1] C. Chen et al., "Enhanced YOLOv5: An Efficient Road Object Detection Method," Sensors, vol. 23, no. 20, art. 8355, 2023.
[2] J. Redmon et al., “You Only Look Once: Unified, Real-Time Object Detection”, in Proc. IEEE CVPR, 2016, pp. 779–788.
[3] N. U. A. Tahir et al., “PVswin-YOLOv8s: UAV-Based Pedestrian and Vehicle Detection for Traffic Management”, Drones, vol. 8, no. 3, art. 84, 2024.
[4] H. Guo et al., “Research on Vehicle Detection Based on Improved YOLOv8 Network”, arXiv:2501.00300, 2025.
[5] X. Liu et al., “YOLOv8-FDD: A Real-Time Vehicle Detection Method Based on Improved YOLOv8”, IEEE Access, vol. 12, pp. 136280–136296, 2024.
[6] M. Tan, R. Pang, and Q. V. Le, “Eff1icientDet: Scalable and Efficient Object Detection”, in Proc. IEEE CVPR, 2020, pp. 10781–10790.
[7] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need," in Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008, 2017.
[8] C. Lv et al., “Vehicle detection and classification using an ensemble of EfficientDet and YOLOv8”, PeerJ Comput. Sci., vol. 10, art. e2233, 2024.
[9] B. Yang, J. Yan, Z. Lei, and S. Z. Li, “Convolutional channel features”, in Proc. IEEE Int. Conf. Comput. Vis., 2015, pp. 82–90.
[10] G. Mandellos, I. Keramitsoglou, and C. Kiranoudis, “A background subtraction algorithm for detecting and tracking vehicles”, Expert Syst. Appl., vol. 38, no. 3, pp. 1619–1631, 2011.
[11] J. Cao, W. Zhang, Z. He, and H. Sun, “Robust vehicle detection in aerial images based on convolutional neural networks”, IEEE Access, vol. 9, pp. 45620–45629, 2021.
[12] M. T. Islam, R. S. Amin, and M. Murshed, “A comprehensive review of deep learning-based approaches for traffic object detection”, Sensors, vol. 24, no. 1, p. 222, 2024.
[13] N. Li, S. Tang, Y. Fan, and Y. Zhang, “YOLOv8-MECA-ASPF: A lightweight vehicle detection algorithm based on multi-attention and context perception”, in Proc. 2024 IEEE Int. Conf. Intell. Transp. Syst. (ITSC), 2024, pp. 1–6.
[14] H. Li, Y. Song, and X. Sun, “Lightweight network design for efficient object detection using MECA and ASPF modules”, Neurocomputing, vol. 540, pp. 25–34, 2023.
[15] L. Wen, D. Du, Z. Cai, Z. Lei, M. Chang, H. Qi, J. Lim, M. Yang, and S. Lyu, “UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking”, Comput. Vis. Image Underst., vol. 193, pp. 102907, 2020.
[16] A. Geiger, P. Lenz, and R. Urtasun, “Are we ready for autonomous driving? The KITTI vision benchmark suite”, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2012, pp. 3354–3361.
[17] Y. Zhu, C. Zhao, J. Wang, X. Zhao, Y. Wu, and H. Lu, “Vision meets drones: A challenge”, in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018, pp. 643–659.
[18] X. Wang et al., “Non-Local Neural Networks”, in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 7794–7803.
[19] J. Terven and D. M. Cordova-Esparza, “A Comprehensive Review of YOLO Architectures in Computer Vision: From YOLOv1 to YOLOv8 and YOLO-NAS”, Machine Learning and Knowledge Extraction, vol. 5, no. 4, pp. 1680–1716, 2023.
[20] I. Bello et al., “Attention Augmented Convolutional Networks”, in Proc. IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 3286–3295.
[21] A. Dosovitskiy et al., “An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale”, in Proc. Int. Conf. on Learning Representations (ICLR), 2021.
[22] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie, “Feature Pyramid Networks for Object Detection”, Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2117–2125.
[23] S. Liu, L. Qi, H. Qin, J. Shi, and J. Jia, “Path Aggregation Network for Instance Segmentation”, Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8759–8768.

