Efficient face detection based crowd density estimation using convolutional neural networks and an improved sliding window strategy

Kian Ara, Rouhollah; Matiolanski, Andrzej; Grega, Michał; Dziech, Andrzej; Baran, Remigiusz

doi:10.34768/amcs-2023-0001

Artykuł - szczegóły

Tytuł artykułu

Efficient face detection based crowd density estimation using convolutional neural networks and an improved sliding window strategy

Autorzy

Kian Ara Rouhollah , Matiolanski Andrzej , Grega Michał , Dziech Andrzej , Baran Remigiusz

Treść / Zawartość

Pełne teksty:

01_kian_ara_matiolanski_grega_efficient_face_detection_based_crowd_2023_1.pdf

Pobierz

Identyfikatory

DOI

10.34768/amcs-2023-0001

Warianty tytułu

Języki publikacji

Abstrakty

Counting and detecting occluded faces in a crowd is a challenging task in computer vision. In this paper, we propose a new approach to face detection-based crowd estimation under significant occlusion and head posture variations. Most state-of-the-art face detectors cannot detect excessively occluded faces. To address the problem, an improved approach to training various detectors is described. To obtain a reasonable evaluation of our solution, we trained and tested the model on our substantially occluded data set. The dataset contains images with up to 90 degrees out-of-plane rotation and faces with 25%, 50%, and 75% occlusion levels. In this study, we trained the proposed model on 48,000 images obtained from our dataset consisting of 19 crowd scenes. To evaluate the model, we used 109 images with face counts ranging from 21 to 905 and with an average of 145 individuals per image. Detecting faces in crowded scenes with the underlying challenges cannot be addressed using a single face detection method. Therefore, a robust method for counting visible faces in a crowd is proposed by combining different traditional machine learning and convolutional neural network algorithms. Utilizing a network based on the VGGNet architecture, the proposed algorithm outperforms various state-of-the-art algorithms in detecting faces ‘in-the-wild’. In addition, the performance of the proposed approach is evaluated on publicly available datasets containing in-plane/out-of-plane rotation images as well as images with various lighting changes. The proposed approach achieved similar or higher accuracy.

Słowa kluczowe

crowd density face detection head pose variation various lighting conditions

wykrywanie twarzy zmiana pozycji głowy warunki oświetleniowe

Wydawca

Oficyna Wydawnicza Uniwersytetu Zielonogórskiego

Czasopismo

International Journal of Applied Mathematics and Computer Science

Rocznik

2023

Tom

Vol. 33, no. 1

Strony

7--20

Opis fizyczny

Bibliogr. 55 poz., rys., tab., wykr.

Twórcy

autor

Kian Ara Rouhollah

rouhollah.kian.ara@ict.agh.edu.pl

Institute of Telecommunications, AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Kraków, Poland

autor

Matiolanski Andrzej

andrzej.matiolanski@agh.edu.pl

Institute of Telecommunications, AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Kraków, Poland

autor

Grega Michał

michal.grega@agh.edu.pl

Institute of Telecommunications, AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Kraków, Poland

autor

Dziech Andrzej

andrzej.dziech@agh.edu.pl

Institute of Telecommunications, AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Kraków, Poland

autor

Baran Remigiusz

r.baran@tu.kielce.pl

Department of Computer Science, Electronics and Electrical Engineering, Kielce University of Technology ul. Zeromskiego 5, 25-369 Kielce, Poland

Bibliografia

[1] Buades, A., Coll, B. and Morel, J.-M. (2011). Non-local means denoising, Image Processing On Line 1: 208-212, DOI: 10.5201/ipol.2011.bcm nlm.
[2] Cevikalp, H. and Triggs, B. (2012). Efficient object detection using cascades of nearest convex model classifiers, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Providence, USA, pp. 3138-3145.
[3] Chan, A.B., Liang, Z.-S.J. and Vasconcelos, N. (2008). Privacy preserving crowd monitoring: Counting people without people models or tracking, 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, USA, pp. 1-7.
[4] Chen, D., Ren, S., Wei, Y., Cao, X. and Sun, J. (2014). Joint cascade face detection and alignment, in D. Fleet et al., (Eds), Computer Vision-ECCV 2014, Springer, Cham, pp. 109-122.
[5] Chen, J., Deng, Y., Bai, G. and Su, G. (2015). Face image quality assessment based on learning to rank, IEEE Signal Processing Letters 22(1): 90-94.
[6] Chen, K., Loy, C.C., Gong, S. and Xiang, T. (2012). Feature mining for localised crowd counting, British Machine Vision Conference (BMVC), Surrey, UK.
[7] Conte, D., Foggia, P., Percannella, G., Tufano, F. and Vento, M. (2010). A method for counting people in crowded scenes, 7th IEEE International Conference on Advanced Video and Signal Based Surveillance, Boston, USA, pp. 225-232.
[8] Davies, A.C., Yin, J.H. and Velastin, S.A. (1995). Crowd monitoring using image processing, Electronics Communication Engineering Journal 7(1): 37-47.
[9] Face++ (2015). Face detection software, http://www.face plusplus.com.
[10] Ferryman, J. and Ellis, A.-L. (2010). PETS2010: Dataset and challenge, 7th IEEE International Conference on Advanced Video and Signal Based Surveillance, Boston, USA, Vol. 1, pp. 143-150.
[11] Fradi, H. and Dugelay, J. (2012). Low level crowd analysis using frame-wise normalized feature for people counting, 2012 IEEE International Workshop on Information Forensics and Security (WIFS), Costa Adeje, Spain, pp. 246-251.
[12] Ghiasi, G. and Fowlkes, C. (2014). Occlusion coherence: Localizing occluded faces with a hierarchical deformable part model, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Columbus, USA, pp. 1899-1906.
[13] Gourier, N., Hall, D. and Crowley, J.L. (2004). Estimating face orientation from robust detection of salient facial features, ICPR International Workshop on Visual Observation of Deictic Gestures, Cambridge, UK, pp. 17-25.
[14] Han, D., Kim, J., Ju, J., Lee, I., Cha, J. and Kim, J. (2014). Efficient and fast multi-view face detection based on feature transformation, 16th International Conference on Advanced Communication Technology, Pyeongchang, Korea (South), pp. 682-686.
[15] Idrees, H., Saleemi, I., Seibert, C. and Shah, M. (2013). Multi-source multi-scale counting in extremely dense crowd images, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Portland, USA, pp. 2547-2554.
[16] Ioffe, S. and Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift, CoRR: abs/1502.03167, http://arxiv.org/abs/1502.03167.
[17] Jiang, H. and Learned-Miller, E. (2017). Face detection with the faster r-CNN, 12th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2017), Washington, USA, pp. 650-657.
[18] Jones, M. and Viola, P. (2003). Fast multi-view face detection, Technical Report TR2003-96, MERL-Mitsubishi Electric Research Laboratories, Cambridge, http://www.merl .com/publications/TR2003-96/.
[19] Keras (2015). Keras extras code, https://github.com/k eras-team/keras.
[20] Khatoon, R., Saqlain, S.M. and Bibi, S. (2012). A robust and enhanced approach for human detection in crowd, 2012 15th International Multitopic Conference (INMIC), Islamabad, Pakistan, pp. 215-221.
[21] Kian Ara, R. and Matiolanski, A. (2019). AGH Crowd Density Estimation Database (ACD), http://kt.agh.edu.p l/matiolanski/CrowdDensityEstimationDatabase/.
[22] King, D.E. (2009). Dlib-ml: A machine learning toolkit, Journal of Machine Learning Research 10: 1755-1758.
[23] King, D.E. (2015). Max-margin object detection, CoRR: abs/1502.00046, http://arxiv.org/abs/1502.0 0046.
[24] Kong, D., Gray, D. and Tao, H. (2005). Counting pedestrians in crowds using viewpoint invariant training, BMVC 2005- Proceedings of the British Machine Vision Conference, Oxford, UK.
[25] Kotan, M., Oz, C. and Kahraman, A. (2021). A linearization-based hybrid approach for 3D reconstruction of objects in a single image, International Journal of Applied Mathematics and Computer Science 31(3): 501-513, DOI: 10.34768/amcs-2021-0034.
[26] Li, J. and Zhang, Y. (2013). Learning surf cascade for fast and accurate object detection, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Portland, USA, pp. 3468-3475.
[27] Li, S.Z., Zhu, L., Zhang, Z., Blake, A., Zhang, H. and Shum, H. (2002). Statistical learning of multi-view face detection, in A. Heyden et al. (Eds), Computer Vision-ECCV 2002, Springer, Berlin, pp. 67–81.
[28] Liao, S., Jain, A.K. and Li, S.Z. (2016). A fast and accurate unconstrained face detector, IEEE Transactions on Pattern Analysis and Machine Intelligence 38(2): 211-223.
[29] Liu, S., Dong, Y., Liu, W. and Zhao, J. (2012). Multi-view face detection based on cascade classifier and skin color, 2012 IEEE 2nd International Conference on Cloud Computing and Intelligence Systems, Hangzhou, China, Vol. 01, pp. 56-60.
[30] Ma, R., Li, L., Huang, W. and Tian, Q. (2004). On pixel count based crowd density estimation for visual surveillance, IEEE Conference on Cybernetics and Intelligent Systems, 2004, Singapore, Vol. 1, pp. 170-173 vol.1.
[31] Mahbub, U., Patel, V., Chandre, D., Barbello, B. and Chellappa, R. (2016). Partial face detection for continuous authentication, CoRR: abs/1603.09364, https://arxi v.org/abs/1603.09364.
[32] Marin-Jimenez, M., Zisserman, A. and Ferrari, V. (2011). “Here’s looking at you, kid”. Detecting people looking at each other in videos, Proceedings of the British Machine Vision Conference, Dundee, UK, pp. 1-12, DOI: 10.5244/C.25.22.
[33] Najibi, M., Samangouei, P., Chellappa, R. and Davis, L.S. (2017). SSH: Single stage headless face detector, 2017 IEEE International Conference on Computer Vision (ICCV), Los Alamitos, USA, pp. 4885-4894.
[34] Open CV (2019). Open Source Computer Vision Library, http s://github.com/opencv/opencv.
[35] Opitz, M., Waltner, G., Poier, G., Possegger, H. and Bischof, H. (2016). Grid loss: Detecting occluded faces, CoRR: abs/1609.00129, http://arxiv.org/abs/1609.00129.
[36] Orozco, J., Martineza, B. and Pantic, M. (2015). Empirical analysis of cascade deformable models for multi-view face detection, Image and Vision Computing 42: 47-61.
[37] Pai, Y.-T., Ruan, S.-J., Shie, M.-C. and Liu, Y.-C. (2006). A simple and accurate color face detection algorithm in complex background, 2006 IEEE International Conference on Multimedia and Expo, Toronto, Canada, Vol. 2006, pp. 1545-1548.
[38] Ren, S., He, K., Girshick, R.B. and Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks, CoRR: abs/1506.01497, http://a rxiv.org/abs/1506.01497.
[39] Rosebrock, A. (2021). MiniVGGNet: Going deeper with CNNs, https://pyimagesearch.com/2021/05/22/minivggnet-going-deeper-with-cnns/.
[40] Saleh, S.A.M., Suandi, S.A. and Ibrahim, H. (2015). Recent survey on crowd density estimation and counting for visual surveillance, Engineering Applications of Artificial Intelligence 41: 103-114.
[41] Shapiro, L.G. and Stockman, G.C. (2001). Computer Vision, Prentice Hall PTR, Upper Saddle River, pp. 137-150.
[42] Sheng-Fuu, L., Jaw-Yeh, C. and Hung-Xin, C. (2001). Estimation of number of people in crowded scenes using perspective transformation, IEEE Transactions on Systems, Man, and Cybernetics A: Systems and Humans 31(6): 645–654.
[43] Simonyan, K. and Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition, arXiv: 1409.1556, https://arxiv.org/abs/1409.1556.
[44] Sun, X., Wu, P. and Hoi, S. (2017). Face detection using deep learning: An improved faster RCNN approach, Neurocomputing 299: 42-50.
[45] Thomaz, C. and Giraldi, G. (2010). A new ranking method for principal components analysis and its application to face image analysis, Image and Vision Computing 28(6): 902–913.
[46] Tsai, Y.-H., Lee, Y.-C., Ding, J.-J., Y. Chang, R. and Hsu, M.-C. (2018). Robust in-plane and out-of-plane face detection algorithm using frontal face detector and symmetry extension, Image and Vision Computing 78: 26-41.
[47] Viola, P. and Jones, M.J. (2004). Robust real-time face detection, International Journal of Computer Vision 57(2): 137-154, DOI: 10.1023/B:VISI.0000013087.49260.fb.
[48] Wan, S., Chen, Z., Zhang, T., Zhang, B. and Wong, K. (2016). Bootstrapping face detection with hard negative examples, CoRR: abs/1608.02236, http://arxiv.org/abs/1608.02236.
[49] Yang, B., Yan, J., Lei, Z. and Li, S.Z. (2014). Aggregate channel features for multi-view face detection, CoRR: abs/1407.4023, http://arxiv.org/abs/1407.4023.
[50] Yang, S., Luo, P., Loy, C.C. and Tang, X. (2015). From facial parts responses to face detection: A deep learning approach, 2015 IEEE International Conference on Computer Vision (ICCV), Las Condes, Chile, pp. 3676-3684.
[51] You-jia, F. and Jian-wei, L. (2010). Rotation invariant multi-view color face detection based on skin color and ADaBOost algorithm, 2015 IEEE International Conference on Computer Vision (ICCV), Wuhan, China.
[52] Zhao, T., Nevatia, R. and Wu, B. (2008). Segmentation and tracking of multiple humans in crowded environments, IEEE Transactions on Pattern Analysis and Machine Intelligence 30(7): 1198-1211.
[53] Zhu, Q., Yeh, M.-C., Cheng, K.-T.T. and Avidan, S. (2006). Fast human detection using a cascade of histograms of oriented gradients, IEEE Computer Society Conference on Computer Vision and Pattern Recognition 2, New York, USA, pp. 1491-1498.
[54] Zhu, X. and Ramanan, D. (2012). Face detection, pose estimation, and landmark localization in the wild, IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Providence, USA, pp. 2879-2886.
[55] Zitouni, M.S. and Sluzek, A. (2022). A data association model for analysis of crowd structure, International Journal of Applied Mathematics and Computer Science 32(1): 81-94, DOI: 10.34768/amcs-2022-0007.

Uwagi

Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023)

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-7003b2ca-6edd-4860-b98f-6423641173c6