Vehicle detection in surveillance videos based on YOLOv5 lightweight network

Wang, Yurui; Yang, Guoping; Guo, Jingbo

doi:10.24425/bpasts.2022.143644

Artykuł - szczegóły

Tytuł artykułu

Vehicle detection in surveillance videos based on YOLOv5 lightweight network

Autorzy

Wang Yurui , Yang Guoping , Guo Jingbo

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.24425/bpasts.2022.143644

Warianty tytułu

Języki publikacji

Abstrakty

The development of surveillance video vehicle detection technology in modern intelligent transportation systems is closely related to the operation and safety of highways and urban road systems. Yet, the current object detection network structure is complex, requiring a large number of parameters and calculations, so this paper proposes a lightweight network based on YOLOv5. It can be easily deployed on video surveillance equipment even with limited performance, while ensuring real-time and accurate vehicle detection. Modified MobileNetV2 is used as the backbone feature extraction network of YOLOv5, and DSC “depthwise separable convolution” is used to replace the standard convolution in the bottleneck layer structure. The lightweight YOLOv5 is evaluated in the UA-DETRAC and BDD100k datasets. Experimental results show that this method reduces the number of parameters by 95% as compared with the original YOLOv5s and achieves a good tradeoff between precision and speed.

Słowa kluczowe

YOLOv5 MobileNetV2 lightweight network vehicle detection

YOLOv5 MobileNetV2 lekka sieć detekcja pojazdu

Wydawca

Polska Akademia Nauk, Wydział IV Nauk Technicznych

Czasopismo

Bulletin of the Polish Academy of Sciences. Technical Sciences

Rocznik

2022

Tom

Vol. 70, nr 6

Strony

art. no. e143644

Opis fizyczny

Bibliogr. 35 poz., rys., tab.

Twórcy

autor

Wang Yurui

wangyuruistar@163.com

Shanghai University of Engineering Science, School of Mechanical and Automotive Engineering, Shanghai, China

autor

Yang Guoping

Shanghai University of Engineering Science, School of Mechanical and Automotive Engineering, Shanghai, China

autor

Guo Jingbo

Shanghai University of Engineering Science, School of Mechanical and Automotive Engineering, Shanghai, China

Bibliografia

[1] L. Qiu et al., “Deep learning-based algorithm for vehicle detection in intelligent transportation systems,” J. Supercomput., vol. 77, no. 10, pp 11083–11098, 2021, doi: 10.1007/s11227-021-03712-9.
[2] J. Zhao et al., “Improved vision-based vehicle detection and classification by optimized YOLOv4,” IEEE Access, vol. 10, pp. 8590–8603, 2022, doi: 10.1109/ACCESS.2022.3143365.
[3] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. 3rd Int. Conf. Learn. Represent. (ICLR), 2015, doi: 10.48550/ARXIV.1409.1556.
[4] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted residuals and linear bottlenecks,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 4510–4520, doi: 10.1109/CVPR.2018.00474.
[5] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, 2001, pp. I–I, doi: 10.1109/CVPR.2001.990517.
[6] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), 2005, vol. 1, pp. 886–893, doi: 10.1109/CVPR.2005.177.
[7] P.F. Felzenszwalb, R.B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained partbased models,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 32, no. 9, pp. 1627–1645, Sept. 2010, doi: 10.1109/TPAMI.2009.167.
[8] H. Fuhrmann, A. Boyko, M.H. Abdelpakey, and M.S. Shehata, “DETECTren: Vehicle object detection using self-supervised learning based on light-weight network for low-power devices,” 2021 IEEE 7th World Forum on Internet of Things (WF-IoT), 2021, pp. 807–811, doi: 10.1109/WF-IoT51360.2021.9594927.
[9] L. Jiao et al., “A survey of deep learning-based object detection,” in IEEE Access, vol. 7, pp. 128837–128868, 2019, doi: 10.1109/ACCESS.2019.2939201.
[10] R. Girshick, J. Donahue, T. Darrell and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587, doi: 10.1109/CVPR.2014.81.
[11] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” in IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1904–1916, 2015, doi: 10.1109/TPAMI.2015.2389824.
[12] R. Girshick, “Fast R-CNN,” in Proc. of 33rd IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1440–1448, doi: 10.48550/arXiv.1504.08083.
[13] J. Redmon, S. Divvala, R. Girshick and A. Farhadi, “You only look once: Unified, real-time object detection,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779–788, doi: 10.1109/CVPR.2016.91.
[14] J. Redmon and A. Farhadi, “YOLO9000: Better, faster, stronger,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 6517–6525, doi: 10.1109/CVPR.2017.690.
[15] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv, 2018, doi: 10.48550/ARXIV.1804.02767.
[16] A. Bochkovskiy, Ch.-Y. Wang, and H.-Y.M. Liao, “YOLOv4: Optimal speed and accuracy of object detection,” arXiv, 2020, doi: 10.48550/ARXIV.2004.10934.
[17] Ch.-Y. Wang, I-H. Yeh, and H.-Y.M. Liao, “You only learn one representation: Unified network for multiple tasks,” arXiv, 2021, doi: 10.48550/ARXIV.2105.04206.
[18] Z. Ge, S. Liu, F. Wang, Z. Li, J. Sun, “YOLOX: Exceeding yolo series in 2021,” arXiv, 2021, doi: 10.48550/ARXIV.2107.08430.
[19] Ch.-Y. Wang, A. Bochkovskiy and H.-Y.M. Liao, “Scaled-YOLOv4: Scaling cross stage partial network,” in Proc. 39th IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 13029–13038, doi: 10.48550/arXiv.2011.08036.
[20] N. Carion et al., “End-to-end object detection with transformers,” in Proc. 17th European Conference on Computer Vision (ECCV), 2020, pp. 213–229, doi: 10.1007/978-3-030-58452-8_13.
[21] X. Zhu et al., “Deformable DETR: Deformable transformers for end-to-end object detection,” arXiv, 2020, doi: 10.48550/ARXIV.2010.04159.
[22] M. Zheng et al., “End-to-end object detection with adaptive clustering transformer,” arXiv, 2020, doi: 10.48550/ARXIV.2011.09315.
[23] F.N. Iandola et al., “SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and<0.5 MB model size,” arXiv, 2016, doi: 10.48550/ARXIV.1602.07360.
[24] A. Gholami et al., “SqueezeNext: Hardware-aware neural network design,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2018, pp. 1719–171909, doi: 10.1109/CVPRW.2018.00215.
[25] X. Zhang, X. Zhou, M. Lin, and J. Sun, “ShuffleNet: An extremely efficient convolutional neural network for mobile devices,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, pp. 6848–6856, doi: 10.1109/CVPR.2018.00716.
[26] N. Ma, X. Zhang, H.T. Zheng, and J. Sun, “Shufflenet v2: Practical guidelines for efficient CNN architecture design,” in Proc. 16th European Conference on Computer Vision (ECCV), 2018, pp. 122–138, doi: 10.1007/978-3-030-01264-9_8.
[27] A.G. Howard et al., “Mobilenets: Efficient convolutional neural networks for mobile vision applications,” arXiv, 2017, doi: 10.48550/arXiv.1704.04861.
[28] M. Sandler et al., “Mobilenetv2: Inverted residuals and linear bottlenecks,” in Proc. 36th IEEE conf. Comput. Vis. Pattern Recognit. (CVPR), 2018, pp. 4510–4520, doi: 10.1109/CVPR.2018.00474.
[29] A. Howard et al., “Searching for mobilenetv3,” in Proc. 17th IEEE Int. Conf. Comput. Vis. (ICCV), 2019, pp. 1314–1324, doi: 10.48550/arXiv.1905.02244.
[30] K. Han, Y.Wang, Q. Tian, J. Guo, C. Xu, and C. Xu, “GhostNet: More Features From Cheap Operations,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 1577–1586, doi: 10.1109/CVPR42600.2020.00165.
[31] M. Tan and Q.V. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in Proc. 36th International Conference on Machine Learning, 2019 (ICML), 2019, pp. 6105–6114, doi: 10.48550/arXiv.1905.11946.
[32] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778, doi: 10.1109/CVPR.2016.90.
[33] F. Chollet, “Xception: deep learning with depthwise separable convolutions,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1800–1807, doi: 10.1109/CVPR.2017.195.
[34] L. Wen et al., “UA-DETRAC: A new benchmark and protocol for multi-object detection and tracking,” Comput. Vision Image Understanding., vol. 193, p. 102907, 2020, doi: 10.1016/j.cviu.2020.102907.
[35] F. Yu et al., “BDD100K: A Diverse Driving Dataset for Heterogeneous Multitask Learning,” 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020, pp. 2633–2642, doi: 10.1109/CVPR42600.2020.00271.

Uwagi

Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-96494063-5841-4dd3-b763-57ea6fd3fe53