Optimizing pedestrian tracking for robust perception with YOLOv8 and deep SORT

Zidani, Ghania; Djarah, Djalal; Benmakhlouf, Abdslam; Khettache, Laid

doi:10.35784/acs-2024-05

Artykuł - szczegóły

Tytuł artykułu

Optimizing pedestrian tracking for robust perception with YOLOv8 and deep SORT

Autorzy

Zidani Ghania , Djarah Djalal , Benmakhlouf Abdslam , Khettache Laid

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.35784/acs-2024-05

Warianty tytułu

Języki publikacji

Abstrakty

Multi-object tracking is a crucial aspect of perception in the area of computer vision, widely used in autonomous driving, behavior recognition, and other areas. The complex and dynamic nature of environments, the ever-changing visual features of people, and the frequent appearance of occlusion interactions all impose limitations on the efficacy of existing pedestrian tracking algorithms. This results in suboptimal tracking precision and stability. As a solution, this article proposes an integrated detector-tracker framework for pedestrian tracking. The framework includes a pedestrian object detector that utilizes the YOLOv8 network, which is regarded as the latest state-of-the-art detector, that has been established. This detector provides an ideal detection base to address limitations. Through the combination of YOLOv8 and the DeepSort tracking algorithm, we have improved the ability to track pedestrians in dynamic scenarios. After conducting experiments on publicly available datasets such as MOT17 and MOT20, a clear improvement in accuracy and consistency was demonstrated, with MOTA scores of 63.82 and 58.95, and HOTA scores of 43.15 and 41.36, respectively. Our research highlights the significance of optimizing object detection to unleash the potential of tracking for critical applications like autonomous driving.

Słowa kluczowe

object detection tracking by detection pedestrian tracking YOLOv8 deep SORT

Wydawca

Polskie Towarzystwo Promocji Wiedzy
Lublin University of Technology

Czasopismo

Applied Computer Science

Rocznik

2024

Tom

Vol. 20, no 1

Strony

72--84

Opis fizyczny

Bibliogr. 39 poz., fig., tab.

Twórcy

autor

Zidani Ghania

g.zidani@univ-batna2.dz

University of Mostefa Ben Boulaid, Department of Pharmacy, Algeria

https://orcid.org/0000-0002-1338-3296

autor

Djarah Djalal

University of Kasdi Merbah, Department of Electrical Engineering, Algeria

autor

Benmakhlouf Abdslam

University of Kasdi Merbah, Department of Electrical Engineering, Algeria

autor

Khettache Laid

University of Kasdi Merbah, Department of Electrical Engineering, Algeria

Bibliografia

[1] Abbas, S. M., & Singh, S. (2018). Region-based object detection and classification using faster R-CNN. 4th International Conference on Computational Intelligence & Communication Technology (CICT) (pp. 1-6). IEEE. https://doi.org/10.1109/ciact.2018.8480413
[2] Behrendt, K., Novak, L., & Botros, R. (2017). A deep learning approach to traffic lights: Detection, tracking, and classification. IEEE International Conference on Robotics and Automation (ICRA) (pp. 1370-1377). IEEE. https:/doi.org/10.1109/ICRA.2017.7989163
[3] Bergmann, P., Meinhardt, T., & Leal-Taixe, L. (2019). Tracking without bells and whistles. ArXiv, abs/1903.05625. https://doi.org/10.48550/arXiv.1903.05625
[4] Bewley, A., Ge, Z., Ott, L., Ramos, F., & Upcroft, B. (2016). Simple online and realtime tracking. IEEE International Conference on Image Processing (ICIP) (pp. 3464-3468). IEEE. https://doi.org/10.1109/ICIP.2016.7533003
[5] Bochinski, E., Eiselein, V., & Sikora, T. (2017). Highspeed tracking-by-detection without using image information. 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (pp. 1-6). IEEE. https:/doi.org/10.1109/AVSS.2017.8078516
[6] Chen, L., Ai, H., Zhuang, Z., & Shang, C. (2018). Real-time multiple people tracking with deeply learned candidate selection and person re-identification. IEEE International Conference on Multimedia and Expo (ICME) (pp. 1-6). IEEE. https://doi.org/10.1109/ICME.2018.8486597
[7] Cheng, Y. (1995). Mean shift, mode seeking, and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17(8), 790-799. https://doi.org/10.1109/34.400568
[8] Ciaparrone, G., Sánchez, F. L., Tabik, S., Troiano, L., Tagliaferri, R., & Herrera, F. (2020). Deep learning in video multi-object tracking: A survey. Neurocomputing, 381, 61–88. https://doi.org/10.1016/j.neucom.2019.11.023
[9] De Rosa, G. H., & Papa, J. P. (2022). Learning to weight similarity measures with Siamese networks: A case study on optimum-path forest. In Optimum-Path Forest (pp. 155–173). Elsevier. https://doi.org/10.1016/B978-0-12-822688-9.00015-3
[10] Ess, A., Schindler, K., Leibe, B., & Van Gool, L. (2010). Object detection and tracking for autonomous navigation in dynamic environments. The International Journal of Robotics Research, 29(14), 1707-1725. https://doi.org/10.1177/0278364910365417
[11] Feng, W., Bai, L., Yao, Y., Gan, W., Wu, W., & Ouyang, W. (2023). Similarity- and quality-guided relation learning for joint detection and tracking. IEEE Transactions on Multimedia, 26, 1267-1280. https://doi.org/10.1109/tmm.2023.3279670
[12] Girshick, R., Donahue, J., Darrell, T., & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. ArXiv, abs/1311.2524. https://doi.org/10.48550/arXiv.1311.2524
[13] Jocher, G., Chaurasia, A., & Qiu, J. (2023). YOLO by Ultralytics. Retrieved February, 2, 2024 from https://github.com/ultralytics/ultralytics
[14] Kalman, R. E. (1960). A new approach to linear filtering and prediction problems. Journal of Basic Engineering, 82(1), 35-45. https://doi.org/10.1115/1.3662552
[15] Kamal, R., Chemmanam, A. J., Jose, B., Mathews, S., & Varghese, E. (2020). Construction safety surveillance using machine learning. International Symposium on Networks, Computers and Communications (ISNCC) (pp. 1-6). IEEE. https:/doi.org/10.1109/ISNCC49221.2020.9297198
[16] Kasturi, R., Goldgof, D., Soundararajan, P., Manohar, V., Garofolo, J., Bowers, R., Boonstra, M., Korzhova, V., & Zhang, J. (2009). Framework for performance evaluation of face, text, and vehicle detection and tracking in video: Data, metrics, and protocol. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2), 319-336. https://doi.org/10.1109/TPAMI.2008.57
[17] Korepanova, A. A., Oliseenko, V. D., & Abramov, M. V. (2020). Applicability of similarity coefficients in social circle matching. 2020 XXIII International Conference on Soft Computing and Measurements (SCM) (pp. 41-43). IEEE. https://doi.org/10.1109/SCM50615.2020.9198782
[18] Kuhn, H. W. (1955). The Hungarian method for the assignment problem. Naval Research Logistics Quarterly, 2(1-2), 83-97. https://doi.org/10.1002/nav.3800020109
[19] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C. Y., & Berg, A. C. (2016). SSD: Single shot multiBox detector. In B. Leibe, J. Matas, N. Sebe, & M. Welling (Eds.), Computer Vision – ECCV 2016 (Vol. 9905, pp. 21–37). Springer International Publishing. https://doi.org/10.1007/978-3-319-46448-0_2
[20] Luiten, J., Osep, A., Dendorfer, P., Torr, P., Geiger, A., Leal-Taixé, L., & Leibe, B. (2021). HOTA: A higher order metric for evaluating multi-object tracking. International Journal of Computer Vision, 129, 548-578. https://doi.org/10.1007/s11263-020-01416-9
[21] Luo, W., Xing, J., Milan, A., Zhang, X., Liu, W., & Kim, T. K. (2021). Multiple object tracking: A literature review. Artificial Intelligence, 293, 103448. https://doi.org/10.1016/j.artint.2020.103448
[22] Mao, Q. C., Sun, H. M., Liu, Y. B., & Jia, R. S. (2019). Mini-YOLOv3: Real-time object detector for embedded applications. IEEE Access, 7, 133529–133538. https://doi.org/10.1109/ACCESS.2019.2941547
[23] Munjal, B., Aftab, A. R., Amin, S., Brandlmaier, M. D., Tombari, F., & Galasso, F. (2020). Joint detection and tracking in videos with identification features. Image and Vision Computing, 100, 103932. https://doi.org/10.1016/j.imavis.2020.103932
[24] Okuma, K., Taleghani, A., De Freitas, N., Little, J. J., & Lowe, D. G. (2004). A boosted particle filter: Multitarget detection and tracking. In T. Pajdla & J. Matas (Eds.), Computer Vision—ECCV 2004 (Vol. 3021, pp. 28–39). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-540-24670-1_3
[25] Pang, B., Li, Y., Zhang, Y., Li, M., & Lu, C. (2020). TubeTK: Adopting tubes to track multi-object in a one-step training model. ArXiv, abs/2006.05683. https://doi.org/10.48550/arXiv.2006.05683
[26] Peng, J., Wang, C., Wan, F., Wu, Y., Wang, Y., Tai, Y., Wang, C., Li, J., Huang, F., & Fu, Y. (2020). Chained-tracker: Chaining paired attentive regression results for end-to-end joint multiple-object detection and tracking. In A. Vedaldi, H. Bischof, T. Brox, & J.-M. Frahm (Eds.), Computer Vision – ECCV 2020 (Vol. 12349, pp. 145–161). Springer International Publishing. https://doi.org/10.1007/978-3-030-58548-8_9
[27] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You only look once: Unified, real-time object detection. ArXiv, abs/1506.02640. https://doi.org/10.48550/arXiv.1506.02640
[28] Ristani, E., Solera, F., Zou, R., Cucchiara, R., & Tomasi, C. (2016). Performance measures and a data set for multi-target, multi-camera tracking. ArXiv, abs/1609.01775. https://doi.org/10.48550/arXiv.1609.01775
[29] Solawetz, J., & Francesco. (2023, January 11). What is yolov8? The ultimate guide. https://blog.roboflow.com/whats-new-in-yolov8/
[30] Sun, Z., Chen, J., Chao, L., Ruan, W., & Mukherjee, M. (2021). A survey of multiple pedestrian tracking based on tracking-by-detection framework. IEEE Transactions on Circuits and Systems for Video Technology, 31(5), 1819-1833. https://doi.org/10.1109/TCSVT.2020.3009717
[31] Treven, J. R., & Cordova-Esparaza, D. M., Romero-González, J. A. (2023). A Comprehensive review of YOLO architectures in computer vision: From YOLOv1 to YOLOv8 and YOLO-NAS. Machine Learning. & Knowledge. Extraction, 5(4), 1680-1716. https://doi.org/10.3390/make5040083
[32] Vijaymeena, M., & Kavitha, K. (2016). A survey on similarity measures in text mining. Machine Learning Applications: An International Journal, 3(1), 19-28.
[33] Wang, Y., Kitani, K., & Weng, X. (2021). Joint object detection and multi-object tracking with graph neural networks. 2021 IEEE International Conference on Robotics and Automation (ICRA) (pp. 13708-13715). https://doi.org/10.1109/icra48506.2021.9561110
[34] Wang, Z., Zheng, L., Liu, Y., Li, Y., & Wang, S. (2020). Towards Real-Time Multi-Object Tracking. In A. Vedaldi, H. Bischof, T. Brox, & J.-M. Frahm (Eds.), Computer Vision – ECCV 2020 (Vol. 12356, pp. 107–122). Springer International Publishing. https://doi.org/10.1007/978-3-030-58621-8_7
[35] Wojke, N., Bewley, A., & Paulus, D. (2017). Simple online and realtime tracking with a deep association metric. 2017 IEEE International Conference on Image Processing (ICIP) (pp. 3645-3649). IEEE. https://doi.org/10.1109/ICIP.2017.8296962
[36] Xu, Y., Ošep, A., Ban, Y., Horaud, R., Leal-Taixé, L., & Alameda-Pineda, X. (2019). How to train your deep multi-object tracker. ArXiv, abs/1906.06618. https://doi.org/10.48550/arxiv.1906.06618
[37] Yu, F., Li, W., Li, Q., Liu, Y., Shi, X., & Yan, J. (2016). POI: Multiple object tracking with high performance detection and appearance feature. ArXiv, abs/1610.06136. https://doi.org/10.48550/arxiv.1610.06136
[38] Zeng, F., Dong, B., Zhang, Y., Wang, T., Zhang, X., & Wei, Y. (2022). Motr: End-to-end multiple object tracking with transformer. ArXiv, abs/2105.03247. https://doi.org/10.48550/arXiv.2105.03247
[39] Zhang, Y., Sun, P., Jiang, Y., Yu, D., Weng, F., Yuan, Z., Luo, P., Liu, W., & Wang, X. (2022). Bytetrack: Multi-object tracking by associating every detection box. ArXiv, abs/2110.06864. https://doi.org/10.48550/arXiv.2110.06864

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-d335fc3e-f166-4917-8dd0-b363247b5743