A lightweight multi-person pose estimation scheme based on Jetson Nano

Liu, Lei; Blancaflor, Eric B.; Abisado, Mideth

doi:10.35784/acs-2023-01

Artykuł - szczegóły

Tytuł artykułu

A lightweight multi-person pose estimation scheme based on Jetson Nano

Autorzy

Liu Lei , Blancaflor Eric B. , Abisado Mideth

Treść / Zawartość

Pełne teksty:

3492-Article Text-16556-1-10-20230401.pdf

Pobierz

Identyfikatory

DOI

10.35784/acs-2023-01

Warianty tytułu

Języki publikacji

Abstrakty

As the basic technology of human action recognition, pose estimation is attracting more and more researchers' attention, while edge application scenarios pose a higher challenge. This paper proposes a lightweight multi-person pose estimation scheme to meet the needs of real-time human action recognition on the edge end. This scheme uses AlphaPose to extract human skeleton nodes, and adds ResNet and Dense Upsampling Revolution to improve its accuracy. Meanwhile, we use YOLO to enhance AlphaPose’s support for multi-person pose estimation, and optimize the proposed model with TensorRT. In addition, this paper sets Jetson Nano as the Edge AI deployment device of the proposed model and successfully realizes the model migration to the edge end. The experimental results show that the speed of the optimized object detection model can reach 20 FPS, and the optimized multi-person pose estimation model can reach 10 FPS. With the image resolution of 320×240, the model’s accuracy is 73.2%, which can meet the real-time requirements. In short, our scheme can provide a basis for lightweight multi-person action recognition scheme on the edge end.

Słowa kluczowe

human pose estimation lightweight model Edge AI deep learning computer vision

Wydawca

Polskie Towarzystwo Promocji Wiedzy
Lublin University of Technology

Czasopismo

Applied Computer Science

Rocznik

2023

Tom

Vol. 19, no 1

Strony

1--14

Opis fizyczny

Bibliogr. 30 poz., fig., tab.

Twórcy

autor

Liu Lei

liulei@hnnu.edu.com

National University, College of Computing and Information Technologies, Philippines

https://orcid.org/0000-0002-1807-6906

autor

Blancaflor Eric B.

ebblancaflor@national-u.edu.ph

Mapua University, School of Information Technology, Philippines

https://orcid.org/0000-0002-7189-3040

autor

Abisado Mideth

mbabisado@national-u.edu.ph

National University, College of Computing and Information Technologies, Philippines

https://orcid.org/0000-0003-4215-7260

Bibliografia

1. Akshatha, K. R., Karunakar, A. K., Shenoy, S. B., Pai, A. K., Nagaraj, N. H., & Rohatgi, S. S. (2022). Human detection in aerial thermal images using faster R-CNN and SSD algorithms. Electronics,11(7), 1151. https://doi.org/10.3390/electronics11071151
2. Alnuaim, A. A., Zakariah, M., Hatamleh, W. A., Tarazi, H., Tripathi, V., & Amoatey, E. T. (2022). Human-computer interaction with hand gesture recognition using ResNet and MobileNet. Computational Intelligence Neuroscience,2022,8777355. https://doi.org/10.1155/2022/8777355
3. Bertasius, G., Feichtenhofer, C., Tran, D., Shi, J., & Torresani, L. (2019). Learning temporal pose estimation from sparsely-labeled Videos. ArXiv,abs/1906.04016. https://doi.org/10.48550/arXiv.1906.04016
4. Cao, Z., Simon, T., Wei, S.-E., & Sheikh, Y. (2016). Realtime multi-person 2D pose estimation using part affinity fields. Proceedings -30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017(pp. 1302–1310). IEEE. https://doi.org/10.1109/CVPR.2017.143.
5. Chen, W., Jiang, Z., Guo, H., & Ni, X. (2020). Fall Detection Based on Key Points of Human-Skeleton Using OpenPose. Symmetry, 12(5), 744. https://doi.org/10.3390/sym12050744
6. Chen, Y., Wang, Z., Peng, Y., Zhang, Z., Yu, G., & Sun, J. (2018). Cascaded pyramid network for multi-person pose estimation. Proceedings of the IEEE Computer Society Conference on Computer Vision Pattern Recognition(pp. 7103–7112). IEEE. https://doi.org/10.1109/CVPR.2018.00742
7. Chung, J.-L., Ong, L.-Y., & Leow, M. C. (2022). Comparative analysis of skeleton-based human pose estimation. Future Internet,14(12), 380. https://doi.org/10.3390/fi14120380
8. Dewangan, D. K., & Sahu, S. P. (2021). Deep learning-based speed bump detection model for intelligent vehicle system using raspberry pi. IEEE Sensors Journal,21, 3570–3578. https://doi.org/10.1109/JSEN.2020.3027097
9. Fang, H., Li, J., Tang, H., Xu, C., Zhu, H., Xiu, Y., Li, Y.-L., & Lu, C. (2022). AlphaPose: Whole-Body Regional Multi-Person Pose Estimation and Tracking in Real-Time. ArXiv, abs/2211.03375. https://doi.org/10.48550/arXiv.2211.03375
10. Fang, H., Xie, S., Tai, Y.-W., & Lu, C. (2017). RMPE: Regional multi-person pose estimation. IEEE International Conference on Computer Vision(pp. 2353–2362). IEEE.https://doi.org/10.48550/arXiv.1612.00137
11. Gamra, M. B., & Akhloufi, M. A. (2021). A review of deep learning techniques for 2D and 3D human pose estimation. Image Vis. Comput,114, 104282. https://doi.org/10.1016/j.imavis.2021.104282
12. Gautam, B. P., Noda, Y., Gautam, R., Sharma, H. P., Sato, K., & Neupane, S. B. (2020). Body part localization and pose tracking by using deepercut algorithm for king cobra's BBL (Biting BehaviorLearning). International Conference on Networking Network Applications(pp. 422–429). IEEE. https://doi.org/10.1109/NaNA51271.2020.00078
13. Ge, Z., Liu, S., Wang, F., Li, Z., & Sun, J.(2021). YOLOX: Exceeding YOLO series in 2021. ArXiv,abs/2107.08430. https://doi.org/10.48550/arXiv.2107.08430
14. Jegham, I., Khalifa, A. B., Alouani, I., & Mahjoub, M. A. (2020). Vision-based human action recognition: An overview and real world challenges. Forensic Science International: Digital Investigation,32, 200901. https://doi.org/10.1016/j.fsidi.2019.200901
15. Jeong, E., Kim, J.,& Ha, S. (2022). TensorRT-Based framework and optimization methodology for deep learning inference on jetson boards. ACM Transactions on Embedded Computing Systems,21, 1–26. https://doi.org/10.1145/3508391
16. Khirodkar, R., Chari, V., Agrawal, A., & Tyagi, A. (2021). Multi-Instance pose networks: rethinking top-down pose estimation. IEEE/CVF International Conference on Computer Vision(pp. 3102-3111). IEEE. https://doi.org/10.48550/arXiv.2101.11223
17. Kong, Y., & Fu, Y. (2022). Human action recognition and prediction: A survey. International Journal of Computer Vision,130(5), 1366-1401. https://doi.org/10.48550/arXiv.1806.11230
18. Kreiss, S., Bertoni, L., & Alahi, A. (2021). OpenPifPaf: Composite fields for semantic keypoint detection and spatio-temporal association. IEEE Transactions on Intelligent Transportation Systems,23, 13498–13511. https://doi.org/10.48550/arXiv.2103.02440
19. Liu, M.-J., Wan, L., Wang, B., & Wang, T.-L. (2023). SE-YOLOv4: shuffle expansion YOLOv4 for pedestrian detection based on PixelShuffle. Applied Intelligence,2023. https://doi.org/10.1007/s10489-023-04456-0
20. Nguyen, S.-H., Le, T.-T.-H., Nguyen, H.-B., Phan, T.-T., Nguyen, C.-T., & Vu, H. (2022). Improving the Hand Pose Estimation from Egocentric Vision via HOPE-Net and Mask R-CNN. International Conference on Multimedia Analysis Pattern Recognition (pp. 1-6). IEEE. https://doi.org/10.1109/MAPR56351.2022.9924768
21. Park, K., Jang, W., Lee, W., Nam, K., Seong, K., Chai, K., & Li, W.-S. (2020). Real-time mask detection on google edge TPU. ArXiv,abs/2010.04427. https://doi.org/10.48550/arXiv.2010.04427
22. Pishchulin, L., Insafutdinov, E., Tang, S., Andres, B., Andriluka, M., Gehler, P., & Schiele, B. (2016). DeepCut: Joint subset partition and labeling for multi person pose estimation. Conferenceon Computer Vision Pattern Recognition(pp. 4929–4937). IEEE. https://doi.org/10.1109/CVPR.2016.533
23. Sediqi,K. M., & Lee, H. J. (2021). A novel upsampling and context convolution for image semantic segmentation. Sensors,21(6), 2170. https://doi.org/10.3390/s21062170
24. Shiraishi, Y. (2020). Latest trend of edge aI devices. Journal of The Japan Institute of Electronics Packaging, 23(2), 145-149. https://doi.org/10.5104/jiep.23.145
25. Sipola, T., Alatalo, J., Kokkonen, T., & Rantonen, M. (2022). Artificial intelligence in the IoT Era: A Review of Edge AI Hardware and Software. 31st Conference of Open Innovations Association(pp. 320-331). IEEE.https://doi.org/10.23919/FRUCT54823.2022.9770931
26. Sun, K., Xiao, B., Liu, D., & Wang, J. (2019). Deephigh-resolution representation learning for human pose estimation. IEEE/CVF Conference on Computer Vision Pattern Recognition(pp. 5686–5696.)IEEE. https://doi.org/10.1109/CVPR.2019.00584.
27. Süzen, A. A., Duman, B., & Şen, B. (2020). Benchmark analysis of jetson TX2, jetson nano and raspberry PI using Deep-CNN. International Congress on Human-Computer Interaction, Optimization Robotic Applications(pp.1–5.) IEEE. https://doi.org/10.1109/HORA49412.2020.9152915
28. Tran, H. Y., Bui, T. M., Pham,T.-L., & Le, V.-H. (2022). An evaluation of 2D human pose estimation based on ResNet backbone. Journal of Engineering Research and Sciences,1(2), 59–67. https://doi.org/10.55708/js0103007
29. Xiao, B., Wu, H., & Wei, Y. (2018). Simple baselines for human pose estimation andtracking. European Conference on Computer Vision. Lecture Notes in Computer Science(pp. 472–487). Springer. https://doi.org/10.1007/978-3-030-01231-1_29
30. Zhang, H.-B., Zhang, Y.-X., Zhong, B., Lei, Q., Yang, L., Du, J.-X., & Chen, D.-S. (2019). A comprehensive survey of vision-based human action recognition methods. Sensors,19(5), 1005–1016.https://doi.org/10.3390/s19051005

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-5c86d03d-68c2-47cd-849c-323021e9ba57