Tytuł artykułu
Autorzy
Treść / Zawartość
Pełne teksty:
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
Two-dimensional human pose estimation has been widely applied in real-world applications such as sports analysis, medical fall detection, human-robot interaction, with many positive results obtained utilizing Convolutional Neural Networks (CNNs). Li et al. at CVPR 2020 proposed a study in which they achieved high accuracy in estimating 2D keypoints estimation/2D human pose estimation. However, the study performed estimation only on the cropped human image data. In this research, we propose a method for automatically detecting and estimating human poses in photos using a combination of YOLOv5 + CC (Contextual Constraints) and HRNet. Our approach inherits the speed of the YOLOv5 for detecting humans and the efficiency of the HRNet for estimating 2D keypoints/2D human pose on the images. We also performed human marking on the images by bounding boxes of the Human 3.6M dataset (Protocol #1) for human detection evaluation. Our approach obtained high detection results in the image and the processing time is 55 FPS on the Human 3.6M dataset (Protocol #1). The mean error distance is 5.14 pixels on the full size of the image (1000×1002). In particular, the average results of 2D human pose estimation/2D keypoints estimation are 94.8% of PCK and 99.2% of PDJ@0.4 (head joint). The results are available.
Słowa kluczowe
Wydawca
Rocznik
Tom
Strony
281--298
Opis fizyczny
Bibliogr. 63 poz., rys.
Twórcy
autor
- Faculty of Engineering Technology, Hung Vuong University, Vietnam
autor
- Faculty of Engineering Technology, Hung Vuong University, Vietnam
autor
- Czestochowa University of Technology, Czestochowa, Poland
autor
- AGH University of Science and Technology, Institute of Computer Science, 30-059 Kraków, Poland
autor
- University of Social Sciences, Institute of Information Technologies, 9 Sienkiewicza Street, 90-113 Łódź, Poland
autor
- Tan Trao University, Vietnam
Bibliografia
- [1] Ssd mobilenet v1 architecture (2018). [Accessed 22 Dec 2021]
- [2] Abdulla, W.: Mask r-cnn for object detection and instance segmentation on keras and tensorflow. https://github.com/matterport/Mask_RCNN (2017). [Accessed 20 Dec 2021] [3] Babu, S.C.: A 2019 guide to human pose estimation with deep learning. https://nanonets.com/blog/human-poseestimation-2d-guide/. [Online: Accessed 5 December 2021]
- [4] Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: Optimal Speed and Accuracy of Object etection. arXiv (2020)
- [5] Burrus, N.: Kinect calibration. http://nicolas.burrus.name/index.php/Research/KinectCalibration
- [6] Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: IEEE Conference on CVPR, vol. 2017-Janua, pp. 1302–1310 (2017). DOI 10.1109/CVPR.2017.143
- [7] Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: CVPR (2017)
- [8] Carreira, J., Agrawal, P., Fragkiadaki, K., Malik, J.: Human pose estimation with iterative error feedback. CoRR abs/1507.06550 (2015)
- [9] Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., Sun, J.: Cascaded Pyramid Network for Multiperson Pose Estimation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 7103–7112 (2018). DOI 10.1109/CVPR.2018.00742
- [10] Dai, J., Li, Y., He, K., Sun, J.: R-FCN: Object detection via region-based fully convolutional networks. Advances in Neural Information Processing Systems pp. 379–387 (2016)
- [11] Dai, J., Li, Y., He, K., Sun, J.: R-fcn: Object detection via region-based fully convolutional networks. In: Advances in Neural Information Processing Systems, vol. 29. Curran Associates, Inc. (2016). https://proceedings.neurips.cc/paper/2016/file/577ef1154f3240ad5b9b413aa7346a1e-Paper.pdf
- [12] Dang, Q., Yin, J., Wang, B., Zheng, W.: Deep learning based 2D human pose estimation: A survey. TPAMI 24(6), 663–676 (2021). DOI 10. 26599/TST.2018.9010100
- [13] Gao, H.: Single shot multibox detector implementation in pytorch. https://github.com/qfgaohao/pytorch-ssd (2020). [Accessed 20 Dec 2021]
- [14] Girshick, R.: Fast R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, vol. 2015 Inter, pp. 1440–1448 (2015). DOI 10.1109/ICCV.2015.169
- [15] Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014). DOI 10.1109/CVPR.2014.81
- [16] Glen., S.: "jaccard index/similarity coefficient" from statisticshowto.com: Elementary statistics for the rest of us!https://www.statisticshowto.com/jaccard-index/.Online; accessed 6 December 2021
- [17] Haque, M.F., Lim, H.y., Kang, D.s.: Object Detection Based on VGG with ResNet Network. In: 2019 International Conference on Electronics, Information, and Communication (ICEIC), pp. 1–3. Institute of electronics and information engineers (IEIE)
- [18] He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: ICCV (2017)
- [19] He, K., Zhang, X., Ren, S., Sun, J.: Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 37(9), 1904–1916 (2015). DOI 10.1109/TPAMI.2015.2389824
- [20] He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on CVPR, vol. 2016-Decem, pp. 770–778 (2016). DOI 10.1109/CVPR.2016.90
- [21] Huang, J., Rathod, V., Sun, C., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., Murphy, K.: Speed/accuracy trade-offs for modern convolutional object detectors. In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-January, pp. 3296–3305 (2017). DOI 10.1109/CVPR.2017.351
- [22] Hung, G.L., Sahimi, M.S.B., Samma, H., Almohamad, T.A., Lahasan, B.: Faster R-CNN Deep Learning Model for Pedestrian Detection from Drone Images. In: SN Computer Science, vol. 1, pp. 1–9. Springer Singapore (2020). DOI 10.1007/s42979-020-00125-y. https://doi.org/10.1007/s42979-020-00125-y
- [23] Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. TPAMI 36(7), 1325–1339 (2014)
- [24] Jocher, G.R.: Head and person detection model. https://github.com/deepakcrk/yolov5-crowdhuman. Online; accessed 6 December 2021
- [25] Jocher, G.R.: Yolov5 tutorials. https://github.com/ultralytics/yolov5. Online; accessed 6 December 2021
- [26] Jonathan, H.: Object detection: speed and accuracy comparison (faster r-cnn, r-fcn, ssd, fpn, retinanet and yolov3) (2018). [Accessed 18 Dec 2021]
- [27] Krishnan, S.: Person-detection. https://github.com/SusmithKrishnan/person-detection (2021). [Accessed 20 Dec 2021]
- [28] Li, N.: Evoskeleton, cascaded 2d-to-3d lifting.https://github.com/Nicholasli1995/EvoSkeleton. Online; accessed 25 December 2021
- [29] Li, S., Ke, L., Pratama, K., Tai, Y.W., Tang, C.K., Cheng, K.T.: Cascaded deep monocular 3d human pose estimation with evolutionary training data. In: The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
- [30] Liang, S., Sun, X., Wei, Y.: Compositional Human Pose Regression. In: ICCV, vol. 176-177, pp. 1–8 (2017). DOI 10.1016/j.cviu.2018.10.006
- [31] Lin, T.Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., Dollár, P.: Microsoft coco: Common objects in context (2014). http://arxiv.org/abs/1405.0312
- [32] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C.Y., Berg, A.C.: SSD: Single shot multibox detector. In: European Conference on Computer Vision, vol. 9905 LNCS, pp. 21–37 (2016). DOI 10.1007/978-3-319-46448-0_2
- [33] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S.E., Fu, C.Y., Berg, A.C.: Ssd: Single shot multibox detector. In: B. Leibe, J. Matas, N. Sebe, M. Welling (eds.) ECCV (1), Lecture Notes in Computer Science, vol. 9905, pp. 21–37. Springer (2016). http://dblp.uni-trier.de/db/conf/eccv/eccv2016-1.htmlLiuAESRFB16
- [34] Luvizon, D.C., Tabia, H., Picard, D.: Human pose regression by combining indirect part detection and contextual information. Computers and Graphics (Pergamon) 85, 15–22 (2019). DOI 10.1016/j.cag. 2019.09.002
- [35] Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Computer Vision – ECCV 2016, pp. 483–499. Springer International Publishing (2016)
- [36] Newell, A., Yang, K., Deng, J.: Stacked Hourglass Networks for Human Pose Estimation. In: ECCV (2016)
- [37] openpose: openpose. https://github.com/CMUPerceptual-Computing-Lab/openpose (2019). [Accessed 23 April 2019]
- [38] Ramanan, D.: Learning to parse images of articulated bodies. In: In NIPS (2006)
- [39] Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 2016-Decem, pp. 779–788 (2016). DOI 10.1109/CVPR.2016.91
- [40] Redmon, J., Farhadi, A.: Yolo9000: Better, faster, stronger. arXiv preprint arXiv:1612.08242 (2016)
- [41] Redmon, J., Farhadi, A.: YOLO9000: Better, faster, stronger. In: Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, vol. 2017-Janua, pp. 6517–6525 (2017). DOI 10.1109/CVPR.2017.690
- [42] Redmon, J., Farhadi, A.: Yolov3 an incremental improvement (2018). http://arxiv.org/abs/1804.02767. [Accessed 18 April 2021]
- [43] Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems 28, pp. 91–99 (2015)
- [44] Ren, S., He, K., Girshick, R., Sun, J.: Faster RCNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2017). DOI 10.1109/TPAMI.2016.2577031
- [45] Sapp, B., Taskar, B.: In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. DOI 10.1109/CVPR. 2013.471
- [46] Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, pp. 1–14 (2015)
- [47] Sun, K., Xiao, B., Liu, D., Wang, J.: Deep highresolution representation learning for human pose estimation. In: CVPR (2019)
- [48] Tan, D.: Image geometric transformation in numpy and opencv. https://towardsdatascience.com/imagegeometric-transformation-in-numpy-and-opencv936f5cd1d315 (2019). Online; accessed 6 December 2021
- [49] Thanh, N.T., Hùng, L.V., Công, P.T.: An Evaluation of Pose Estimation in Video of Traditional Martial Arts Presentation. Journal of Research and Development on Information and Communication Technology 2019(2), 114–126 (2019). DOI 10.32913/mic-ict-research.v2019.n2.864
- [50] Tompson, J., Goroshin, R., Jain, A., LeCun, Y., Bregler, C.: Efficient object localization using convolutional networks. In: CVPR, pp. 648–656. IEEE Computer Society (2015)
- [51] Toshev, A., Szegedy, C.: Deeppose: Human pose estimation via deep neural networks. CoRR abs/1312.4659 (2013). http://dblp.unitrier.de/db/journals/corr/corr1312.htmlToshevS13
- [52] Toshev, A., Szegedy, C.: DeepPose: Human Pose Estimation via Deep Neural Networks. In: IEEE Conference on CVPR (2014)
- [53] Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., Liu, D., Mu, Y., Tan, M., Wang, X., Liu, W., Xiao, B.: Deep high-resolution representation learning for visual recognition. TPAMI
- [54] Wei, S.E., Ramakrishna, V., Kanade, T., Sheikh, Y.: Convolutional pose machines. In: CVPR (2016)
- [55] Weiming Chen , Zijie Jiang, H.G., Ni, X.: Fall Detection Based on Key Points of of human-skeleton using openpose. Symmetry (2020)
- [56] Willett, N.S., Shin, H.V., Jin, Z., Li, W., Finkelstein, A.: Pose2Pose: Pose Selection and Transfer for 2D Character Animation. In: International Conference on Intelligent User Interfaces, Proceedings IUI, pp. 88–99 (2020). DOI 10.1145/3377325.3377505
- [57] Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: European Conference on Computer Vision (ECCV) (2018)
- [58] Yang, W.: Human Pose Estimation 101. https://github.com/cbsudux/Human-PoseEstimation-101percentage-of-correct-keypoints—pck (2019). [Accessed 18 April 2021]
- [59] Yang, W., Ouyang, W., Li, H., Wang, X.: Endto-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. In: CVPR (2016)
- [60] Yang, W., Ouyang, W., Li, H., Wang, X.: End-to-end learning of deformable mixture of parts and deep convolutional neural networks for human pose estimation. https://github.com/bearpaw/eval_pose (2016). Online; accessed 20 December 2021
- [61] Zhang, H., Sciutto, C., Agrawala, M., Fatahalian, K.: Vid2Player: Controllable Video Sprites That Behave and Appear Like Professional Tennis Players. ACM Transactions on Graphics 40(3), 1–16 (2021). DOI 10.1145/3448978
- [62] Zhang, X., Zou, J., He, K., Sun, J.: Accelerating Very Deep Convolutional Networks for Classification and Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence 38(10), 1943–1955 (2016). DOI 10.1109/TPAMI.2015.2502579
- [63] Zhou, X., Huang, Q., Sun, X., Xue, X., Wei, Y. Towards 3d human pose estimation in the wild: A weakly-supervised approach. In: The IEEE International Conference on Computer Vision (ICCV) (2017)
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-b90ac873-a7bf-4d76-9cc5-be438e1eb35b