Nowa wersja platformy, zawierająca wyłącznie zasoby pełnotekstowe, jest już dostępna.
Przejdź na https://bibliotekanauki.pl

PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
2022 | Vol. 30 | 181--190
Tytuł artykułu

A lightweight approach to two-person interaction classification in sparse image sequences

Wybrane pełne teksty z tego czasopisma
Warianty tytułu
Konferencja
Federated Conference on Computer Science and Information Systems (17 ; 04-07.09.2022 ; Sofia, Bulgaria)
Języki publikacji
EN
Abstrakty
EN
A lightweight neural network-based approach to two-person interaction classification in sparse image sequences, based on predetection of human skeletons in video frames, is proposed. The idea is to use an ensemble of “weak” pose classifiers, where every classifier is trained on a different time-phase of the same set of actions. Thus, differently than in typical assembly classifiers the expertise of “weak” classifiers is distributed over time and not over the feature domain. Every classifier is trained independently to classify time-indexed snapshots of a visual action, while the overall classification result is a weighted combination of their results. The training data need not any extra labeling effort, as the particular frames are automatically adjusted with time indices. The use of pose classifiers for video classification is key to achieve a lightweight solution, as it limits the motion-based feature space in the deep encoding stage. Another important element is the exploration of the semantics of the skeleton data, which turns the input data into reliable and powerful feature vectors. In other words, we avoid to spent ANN resources to learn feature-related information, that can be already analytically extracted from the skeleton data. An algorithm for merging-elimination and normalization of skeleton joints is developed. Our method is trained and tested on the interaction subset of the well-known NTU-RGB+D dataset , although only 2D skeleton information is used, typical in video analysis. The test results show comparable performance of our method with some of the best so far reported STM and CNN-based classifiers for this dataset, when they process sparse frame sequences, like we did. The recently proposed multistream Graph CNNs have shown superior results but only when processing dense frame sequences. Considering the dominating processing time and resources needed for skeleton estimation in every frame of the sequence, the key to real-time interaction recognition is to limit the number of processed frames.
Wydawca

Rocznik
Tom
Strony
181--190
Opis fizyczny
Bibliogr. 34 poz., il., tab., wykr.
Twórcy
  • Warsaw University of Technology Institute of Control and Computation Eng. ul. Nowowiejska 15/19 00-665 Warszawa, Poland, wlodzimierz.kasprzak@pw.edu.pl
Bibliografia
  • 1. M. Liu and J. Yuan, “Recognizing Human Actions as the Evolution of Pose Estimation Maps”, in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 1159-1168, http://dx.doi.org/10.1109/CVPR.2018.00127.
  • 2. E. Cippitelli, E. Gambi, S. Spinsante, and F. Florez-Revuelta, “Evaluation of a skeleton-based method for human activity recognition on a large-scale RGB-D dataset,” in 2nd IET International Conference on Technologies for Active and Assisted Living (TechAAL 2016), London, UK, 24-25 October 2016, pp. 1-6, http://dx.doi.org/10.1049/ic.2016.0063.
  • 3. S. Zhang, Z. Wei, J. Nie, L. Huang, S. Wang, and Z. Li, “A Review on Human Activity Recognition Using Vision-Based Method,” Journal of Healthcare Engineering, Hindawi, vol. 2017, Article ID 3090343, 31 pages, 2017, http://dx.doi.org/10.1155/2017/3090343, https://www.hindawi.com/journals/jhe/2017/3090343/
  • 4. A. Wilkowski, W. Kasprzak and M. Stefanczyk, “Object detection in the police surveillance scenario,” in Proceedings of the 2019 Federated Conference on Computer Science and Information Systems, ACSIS, vol. 18, 2019, pp. 363-372, http://dx.doi.org/10.15439/2019F291 .
  • 5. A. Stergiou and R. Poppe, “Analyzing human-human interactions: A survey,” Computer Vision and Image Understanding, Elsevier, vol. 188, 2019, p. 102799, http://dx.doi.org/10.1016/j.cviu.2019.102799, https://www.sciencedirect.com/science/article/pii/S1077314219301158
  • 6. A. Bevilacqua, K. MacDonald, A. Rangarej, V. Widjaya, B. Caulfield, and T. Kechadi, “Human Activity Recognition with Convolutional Neural Networks,” in Machine Learning and Knowledge Discovery in Databases, ECML PKDD 2018, Lecture Notes in Computer Science, vol. 11053, Springer, Cham, Switzerland, 2019, pp. 541-552, http://dx.doi.org/10.1007/978-3-030-10997-4_33.
  • 7. N. A. Mac and N. H. Son, “Rotation Invariance in Graph Convolutional Networks,” in Proceedings of the 16th Conference on Computer Science and Intelligence Systems, ACSIS, vol. 25, 2021, pp. 81–90, http://dx.doi.org/10.15439/2021F140 .
  • 8. Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, “OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Field,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 1, pp. 172-186, Jan. 2021, http://dx.doi.org/10.1109/TPAMI.2019.2929257.
  • 9. A. Toshev and C. Szegedy, “DeepPose: Human Pose Estimation via Deep Neural Networks,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 2014, pp. 1653-1660, http://dx.doi.org/10.1109/CVPR.2014.214.
  • 10. E. Insafutdinov, L. Pishchulin, B. Andres, M. Andriluka, and B. Schiele, ”Deepercut: a deeper, stronger, and faster multi-person pose estimation model,” in Computer Vision – ECCV 2016,, Lecture Notes in Computer Science, vol. 9910, Springer, Cham, Switzerland, 2016, pp. 34-50. https://doi.org/10.1007/978-3-319-46466-4_3.
  • 11. H.-D. Duan, J. Wang, K. Chen and D. Lin, “PYSKL: Towards Good Practices for Skeleton Action Recognition,” https://arxiv.org/abs/2205.09443v1[cs.CV], 15 May 2022, https://arxiv.org/abs/2205.09443v1 (accessed on 15.07.2022).
  • 12. [Online], “Papers with code. Action recognition in videos,” https://paperswithcode.com/task/action-recognition-in-videos, (accessed on 15.07.2022).
  • 13. R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts”, Neural Computation, vol. 3, no. 1, pp. 79–87, March 1991, http://dx.doi.org/10.1162/neco.1991.3.1.79.
  • 14. A. Shahroudy, J. Liu, T.-T. Ng, and G. Wang, “NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis,” https://arxiv.org/abs/1604.02808[cs.CV], 2016, https://arxiv.org/abs/1604.02808 (accessed on 15.07.2022).
  • 15. H. Meng, M. Freeman, N. Pears, and C. Bailey, “Real-time human action recognition on an embedded, reconfigurable video processing architecture,” J. Real-Time Image Proc., vol. 3, no. 3, pp. 163–176, 2008, http://dx.doi.org/10.1007/s11554-008-0073-1.
  • 16. K.G. Manosha Chathuramali and R. Rodrigo, “Faster human activity recognition with SVM,” International Conference on Advances in ICT for Emerging Regions (ICTer2012), Colombo, Sri Lanka, 12-15 December 2012, IEEE, 2012, pp. 197-203, http://dx.doi.org/10.1109/icter.2012.6421415.
  • 17. X. Yan and Y. Luo, “Recognizing human actions using a new descriptor based on spatial–temporal interest points and weighted-output classifier,” Neurocomputing, Elsevier, vol. 87, pp. 51–61, 15 June 2012, http://dx.doi.org/10.1016/j.neucom.2012.02.002.
  • 18. R. Vemulapalli, F. Arrate, and R. Chellappa, “Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group,” in 2014 IEEE Conference on Computer Vision and Pattern Recognition, 23-28 June 2014, Columbus, OH, USA, IEEE, 2014, pp. 588-595, doi: 10.1109/cvpr.2014.82.
  • 19. J. Liu, A. Shahroudy, D. Xu, and G. Wang, “Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognitio,” in Computer Vision – ECCV 2016, Lecture Notes in Computer Science, vol. 9907, Springer, Cham, Switzerland, 2016, pp. 816–833, http://dx.doi.org/10.1007/978-3-319-46487-9_50.
  • 20. C. Li, Q. Zhong, D. Xie, and S. Pu, “Skeleton-based Action Recognition with Convolutional Neural Networks,” 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW), 10-14 July 2017, Hong Kong, pp. 597-600, http://dx.doi.org/10.1109/ICMEW.2017.8026285.
  • 21. D. Liang, G. Fan, G. Lin, W. Chen, X. Pan, and H. Zhu, “Three-Stream Convolutional Neural Network With Multi-Task and Ensemble Learning for 3D Action Recognition,” 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 16-17 June 2019, Long Beach, CA, USA, IEEE, pp. 934-940, http://dx.doi.org/10.1109/cvprw.2019.00123.
  • 22. S. Yan, Y. Xiong, and D. Lin, “Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition,” https://arxiv.org/abs/1801.07455 [cs.CV], 2018, https://arxiv.org/abs/1801.07455, (accessed on 15.07.2022).
  • 23. M. Li, S. Chen, X. Chen, Y. Zhang, Y. Wang, and Q. Tian, “Actional-Structural Graph Convolutional Networks for Skeleton-Based Action Recognition,” in 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15-20 June 2019, pp. 3590-3598, http://dx.doi.org/10.1109/CVPR.2019.00371.
  • 24. L. Shi, Y. Zhang, J. Cheng and H.-Q. Lu, “Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition,” https://arxiv.org/abs/1805.07694v3 [cs.CV] , 10 July 2019, http://dx.doi.org/10.48550/ARXIV.1805.07694, https://arxiv.org/abs/1805.07694v3, (accessed on 15.07.2022).
  • 25. L. Shi, Y. Zhang, J. Cheng, and H.-Q. Lu, “Skeleton-based action recognition with multi-stream adaptive graph convolutional networks,” IEEE Transactions on Image Processing, vol. 29, pp. 9532-9545, October 2020, http://dx.doi.org/10.1109/TIP.2020.3028207 .
  • 26. M. Perez, J. Liu, and A.C. Kot, “Interaction Relational Network for Mutual Action Recognition,” https://arxiv.org/abs/1910.04963 [cs.CV], 2019, https://arxiv.org/abs/1910.04963 (accessed on 15.07.2022).
  • 27. L-P. Zhu, B. Wan, C.-Y. Li, G. Tian, Y. Hou and K. Yuan, “Dyadic relational graph convolutional networks for skeleton-based human interaction recognition,” Pattern Recognition, Elsevier, vol. 115, 2021, p. 107920, http://dx.doi.org/10.1016/j.patcog.2021.107920.
  • 28. [Online], “openpose”, CMU-Perceptual-Computing-Lab, 2021 https://github.com/CMU-Perceptual-Computing-Lab/openpose/ , (accessed on 15.07.2022).
  • 29. [Online], “Keras: the Python deep learning API,” https://keras.io/ , (accessed on 15.07.2022).
  • 30. [Online], “Keras Tuner,” https://keras-team.github.io/keras-tuner/ , (accessed on 15.07.2022).
  • 31. T. Yu and H. Zhu, “Hyper-Parameter Optimization: A Review of Algorithms and Applications,” https://arxiv.org/abs/2003.05689 [cs.LG], 12 Mar 2020, https://arxiv.org/abs/2003.05689 , (accessed on 15.07.2022).
  • 32. J. Liu, A. Shahroudy, G. Wang, L.-Y. Duan, and A. C. Kot, “Skeleton-Based Online Action Prediction Using Scale Selection Network,” IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), vol. 42, no. 6, pp. 1453–1467, 1 June 2020, http://dx.doi.org/10.1109/T-PAMI.2019.2898954.
  • 33. J. Liu, G. Wang, P. Hu, L.-Y. Duan, and A. C. Kot, “Global Context-Aware Attention LSTM Networks for 3D Action Recognition,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA, 21-26 July 2017, pp. 3671-3680, http://dx.doi.org/10.1109/CVPR.2017.391.
  • 34. J. Liu, G. Wang, L.-Y. Duan, K. Abdiyeva, and A. C. Kot, “Skeleton-Based Human Action Recognition with Global Context-Aware Attention LSTM Networks,” IEEE Transactions on Image Processing (TIP), vol. 27, no. 4, pp. 1586-1599, April 2018, http://dx.doi.org/10.1109/TIP.2017.2785279.
Uwagi
1. Track 3: 4th International Workshop on Artificial Intelligence in Machine Vision and Graphics
2. Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023).
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.baztech-b8a53d86-3fab-42e2-86a7-c1a20ca2d2dc
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.