PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Polish dance music classification based on mel spectrogram decomposition

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Folk dances and music are essential aspects of intangible cultural heritage identifying the history and traditions of nations. Due to dynamic changes in the social structure, many national aspects are not cultivated and therefore forgotten. There is a need to develop methods to preserve these valuable aspects of culture. There are five Polish national dances: the Polonez, the Oberek, the Mazur, the Krakowiak, and the Kujawiak that reflect key elements of Polish intangible cultural heritage. They can be observed both in the way of performing dances as well as in music. There are many preserved audio and video files that differ depending on the multiple features such as composers or versions. The primary objective of this study was to apply machine learning approaches in order to distinguish the above-mentioned music of Polish traditional dances. The audio recordings dataset consisting of 137 dances in mp3 format was created. Each recording was divided into ten-second files reflecting the characteristic elements of each dance. The transformation of sound to the Mel scale improves human auditory perception. Thus, from every recording the Mel-spectrograms were generated. For the purpose of this study the most applied classification tools were compared such as VGG16, ResNet50, DenseNet121, and MobileNetV2. To compare the performance of the selected models, the following measures were applied: accuracy, precision, recall, and F1 score. ResNet50 achieved the best testing accuracy (over 90%), while DenseNet121 had the best testing loss (0.38).
Twórcy
  • Department of Computer Science, Faculty of Electrical Engineering and Computer Science, Lublin University of Technology, ul. Nadbystrzycka 38D, 20-618 Lublin, Poland
  • Department of Computer Science, Faculty of Electrical Engineering and Computer Science, Lublin University of Technology, ul. Nadbystrzycka 38D, 20-618 Lublin, Poland
Bibliografia
  • 1. Wargowska-Dudek, A. Ochrona dziedzictwa kulturowego na przykładzie polskich tańców narodowych – praktyki depozytariusza Henryka Dudy (in Polish). Perspektywy Kultury, 2023; 40(1): 95–108. https://doi.org/10.35765/pk.2023.4001.08.
  • 2. NID. (2024, April 23). Krajowa Lista Niematerialnego Dziedzictwa Kulturowego (in Polish) – NID. https://niematerialne.nid.pl/en/niematerialne-dziedzictwo-kulturowe/krajowa-lista-niematerialnego-dziedzictwa-kulturowego/ (Accessed: 08.06.2024).
  • 3. Olha, S. Elements of classic choreography at the academization of Polish folk-stage dance. Zenodo (in Ukrainian) (CERN European Organization for Nuclear Research). 2023. https://doi.org/10.5281/zenodo.7806696.
  • 4. Skublewska-Paszkowska, M., Powroznik, P., Smolka, J., Milosz, M., Lukasik, E., Mukhamedova, D., & Milosz, E. Methodology of 3D scanning of intangible cultural heritage – the example of Lazgi Dance. Applied Sciences, 2021; 11(23): 11568. https://doi.org/10.3390/app11231156.
  • 5. Li, N.P. The mediating effect of artificial intelligence on the relationship between cultural heritage preservation and opera music: A case study of Shanxi Opera. Evolutionary Studies in Imaginative Culture, 2024; 249–267.
  • 6. Yu, T., Wang, X., Xiao, X., Yu, R. Harmonizing tradition with technology: Using AI in traditional music preservation. Conference: 2024 International Joint Conference on Neural Networks (IJCNN), 2024; 15: 1–8. https://doi.org/10.1109/ijcnn60899.2024.10651124.
  • 7. Rallis, I., Voulodimos, A., Bakalos, N., Protopapadakis, E., Doulamis, N., Doulamis, A. Machine learning for intangible cultural heritage: A review of techniques on dance analysis. Springer Series on Cultural Computing, 2020; 103–119. https://doi.org/10.1007/978-3-030-37191-3_6.
  • 8. Huang, L., Song, Y. Intangible cultural heritage management using machine learning model: A case study of northwest folk song huaer. Scientific Programming, 2022; 1–9. https://doi.org/10.1155/2022/1383520.
  • 9. Stacchio, L., Garzarella, S., Cascarano, P., De Filippo, A., Cervellati, E., & Marfia, G. DanXe: An extended artificial intelligence framework to analyze and promote dance heritage. Digital Applications in Archaeology and Cultural Heritage, 2024; 33: e00343. https://doi.org/10.1016/j.daach.2024.e00343.
  • 10. Skublewska-Paszkowska, M., Powroźnik, P., Barszcz, M., Dziedzic, K., Aristodou, A. Identifying and animating movement of zeibekiko sequences by spatial temporal graph convolutional network with multi-attention modules. Advances in Science and Technology. Research Journal, 2024; 18(8).
  • 11. Schedl, M., Gómez, E., Urbano, J. Music information retrieval: Recent developments and applications. Foundations and Trends in Information Retrieval, 2014; 8(2–3): 127–261. https://doi.org/10.1561/1500000042.
  • 12. Mehta, J., Gandhi, D., Thakur, G., Kanani, P. Music genre classification using transfer learning on log-based MEL Spectrogram. Conference: 2021 5th International Conference on Computing Methodologies and Communication (ICMC), 2021. https://doi.org/10.1109/iccmc51019.2021.9418035.
  • 13. Li, J., Han, L., Li, X., Zhu, J., Yuan, B., Gou, Z. An evaluation of deep neural network models for music classification using spectrograms. Multimedia Tools and Applications, 2021; 81(4): 4621–4647. https://doi.org/10.1007/s11042-020-10465-9.
  • 14. Dhall, A., Murthy, Y.V.S., Koolagudi, S.G. Music genre classification with convolutional neural networks and comparison with f, q, and mel spectrogram-based images. In Advances in Intelligent Systems and Computing, 2021; 235–248. https://doi.org/10.1007/978-981-33-6881-1_20.
  • 15. Powroznik, P., Wojcicki, P., Przylucki, S.W. Scalogram as a representation of emotional speech. IEEE Access, 2021; 9: 154044-154057. https://doi.org/10.1109/ACCESS.2021.3127581.
  • 16. Czerwinski, D., Powroznik, P. Human emotions recognition with the use of speech signal of Polish language. 2018 Conference on Electrotechnology: Processes, Models, Control and Computer Science (EPMCCS), 2018; 1-6. https://doi.org/10.1109/EPMCCS.2018.8596404.
  • 17. gtzan. (n.d.). TensorFlow. A gtzan dataset. https://www.tensorflow.org/datasets/catalog/gtzan (Accessed: 08.06.2024).
  • 18. Mdeff. (n.d.). GitHub – mdeff/fma: FMA: A dataset for music analysis. GitHub. https://github.com/mdeff/fma (Accessed: 08.06.2024).
  • 19. Defferrard, M., Benzi, K., Vandergheynst, P., Bresson, X. FMA: A dataset for music analysis. arXiv (Cornell University), 2017; 316–323. https://doi.org/10.48550/arXiv.1612.01840.
  • 20. Hassen, A.K., Janßen, H., Assenmacher, D., Preuss, M., Vatolkin, I. Classifying music genres using image classification neural networks. Archives of Data Science, Series A (Online First), 2018; 5(1), 20. https://doi.org/10.5445/ksp/1000087327/20.
  • 21. Rawat, P., Bajaj, M., Vats, S., Sharma, V. A comprehensive study based on MFCC and spectrogram for audio classification. Journal of Information & Optimization Sciences, 2013; 44(6): 1057–1074. https://doi.org/10.47974/jios-1431.
  • 22. Yin, T. Music track recommendation using Deep-CNN and MEL Spectrograms. Journal on Special Topics in Mobile Networks and Applications/Mobile Networks and Applications, 2023. https://doi.org/10.1007/s11036-023-02170-2.
  • 23. Matocha, M., Zielinski, S.K. Music genre recognition using convolutional neural networks. Advances in Computer Science Research, 2015; 14: 125–142. https://doi.org/10.24427/acsr-2018-vol14-0008.
  • 24. Jang, B., Heo, W., Kim, J., & Kwon, O. Music detection from broadcast contents using convolutional neural networks with a Mel-scale kernel. EURASIP Journal on Audio, Speech and Music Processing, 2019; 1. https://doi.org/10.1186/s13636-019-0155-y.
  • 25. Sawant, O., Bhowmick, A., Bhagwat, G. Separation of speech & music using temporal-spectral features and neural classifiers. Evolutionary Intelligence, 2023. https://doi.org/10.1007/s12065-023-00828-0.
  • 26. Hizlisoy, S., Arslan, R.S., Çolakoğlu, E. Singer identification model using data augmentation and enhanced feature conversion with hybrid feature vector and machine learning. EURASIP Journal on Audio, Speech and Music Processing, 2024: 1. https://doi.org/10.1186/s13636-024-00336-8.
  • 27. Ning, Q., Shi, J. Artificial neural network for folk music style classification. Journal of Mobile Information Systems, 2022; 1–9. https://doi.org/10.1155/2022/9203420.
  • 28. Wang, X. Research on recognition and classification of folk music based on feature extraction algorithm. Informatica, 2020; 44(4). https://doi.org/10.31449/inf.v44i4.3388.
  • 29. Mi, D., & Qin, L. Classification system of national music rhythm spectrogram based on biological neural network. Computational Intelligence and Neuroscience, 2022: 1–10. https://doi.org/10.1155/2022/2047576.
  • 30. Mirza, F.K., Gürsoy, A.F., Baykaş, T., Hekimoğlu, M., Pekcan, Ö. Residual LSTM neural network for time-dependent consecutive pitch string recognition from spectrograms: A study on Turkish classical music makams. Multimedia Tools and Applications, 2023; 83(14): 41243–41271. https://doi.org/10.1007/s11042-023-17105-y.
  • 31. Abed, M.H., Al-Asfoor, M., Hussain, Z.M. Architectural heritage images classification using deep learning with CNN. Conference: VIPERC 2020 Visual Pattern Extraction and Recognition for Cultural Heritage Understanding, 2020; 2602: 1–12.
  • 32. Zou, Z., Yu, Z., Guo, Y., Li, Y., Liang, D., Cao, Y., Zhang, S. Triplane Meets Gaussian Splatting: Fast and Generalizable Single-View 3D Reconstruction with Transformers. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024; 10324–10335. https://doi.org/10.1109/cvpr52733.2024.00983.
  • 33. Vu, M., Beurton-Aimar, M., Le, V. Heritage image classification by convolutional neural networks. IEEE, 2018. https://doi.org/10.1109/mapr.2018.8337517.
  • 34. Babić, R.J. A comparison of methods for image classification of cultural heritage using transfer learning for feature extraction. Neural Computing and Applications, 2023; 36(20): 11699–11709. https://doi.org/10.1007/s00521-023-08764-x.
  • 35. Mehta, S., Kukreja, V., Bordoloi, D. Heritage coin identification using convolutional neural networks: A multi-classification approach for numismatic research. IEEE, 2023. https://doi.org/10.1109/icaiss58487.2023.10250481.
  • 36. Loupas, G., Pistola, T., Diplaris, S., Ioannidis, K., Vrochidis, S., Kompatsiaris, I. Comparison of deep learning techniques for video-based automatic recognition of Greek folk dances. In Lecture Notes in Computer Science, 2023; 325–336. https://doi.org/10.1007/978-3-031-27818-127.
  • 37. Jain, N., Bansal, V., Virmani, D., Gupta, V., Salas-Morera, L., Garcia-Hernandez, L. An enhanced deep convolutional neural network for classifying Indian classical dance forms. Applied Sciences, 2021; 11(14): 6253. https://doi.org/10.3390/app11146253.
  • 38. Powroznik, P., Czerwiński, D. Spectral methods in Polish emotional speech recognition. Advances in Science and Technology. Research Journal, 2016; 10(32). https://doi.org/10.12913/22998624/65138.
  • 39. Kumar, C.S.A., Maharana, A.D., Krishnan, S.M., Hanuma, S.S.S., Lal, G.J., Ravi, V. Speech emotion recognition using CNN-LSTM and Vision Transformer. In Lecture Notes in Networks and Systems, 2013; 86–97. https://doi.org/10.1007/978-3-031-27499-2_8.
  • 40. Powroznik, P. Polish emotional speech recognition using artificial neural network. Advances in Science and Technology. Research Journal, 2014; 8(24): 24–27. https://doi.org/10.12913/22998624/562.
  • 41. Utebayeva, D., Ilipbayeva, L., Matson, E.T. Practical study of recurrent neural networks for efficient real-time drone sound detection: A review. Drones, 2013; 7(1): 26. https://doi.org/10.3390/drones7010026.
  • 42. Bahuleyan, H. Music genre classification using machine learning techniques. arXiv (Cornell University), 2018. https://doi.org/10.48550/arxiv.1804.01149.
  • 43. Simonyan, K., & Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv (Cornell University), 2014. https://doi.org/10.48550/arxiv.1409.1556.
  • 44. Russakovsky, O., Deng, J., Su, H., Krause, J., Sathesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., & Fei-Fei, L. ImageNet large scale visual recognition challenge. International Journal of Computer Vision, 2015; 115(3): 211–252. https://doi.org/10.1007/s11263-015-0816-y.
  • 45. He, K., Zhang, X., Ren, S., Sun, J. Deep residual learning for image recognition. arXiv (Cornell University), 2015. https://doi.org/10.48550/arxiv.1512.03385.
  • 46. Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q. Densely connected convolutional networks. arXiv (Cornell University), 2016. https://doi.org/10.48550/arxiv.1608.06993.
  • 47. Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L. MobileNetV2: Inverted residuals and linear bottlenecks. arXiv (Cornell University), 2018. https://doi.org/10.48550/arxiv.1801.04381.
  • 48. Skublewska-Paszkowska, M., Powroznik, P., Lukasik, E. Learning three-dimensional tennis shots using graph convolutional networks. Sensors, 2020; 20(21): 6094. https://doi.org/10.3390/s20216094.
  • 49. Hossin, M., Sulaiman, M.N. A review on evaluation metrics for data classification evaluations. International Journal of Data Mining and Knowledge Management Process, 2015; 5(2): 01–11. https://doi.org/10.5121/ijdkp.2015.5201.
  • 50. Skublewska-Paszkowska, M., & Powroznik, P. Temporal pattern attention for multivariate time series of tennis strokes classification. Sensors, 2023; 23(5): 2422. https://doi.org/10.3390/s23052422.
  • 51. Marom, N.D., Rokach, L., Shmilovici, A. Using the confusion matrix for improving ensemble classifiers. 2010 IEEE 26th Convention of Electrical and Electronics Engineers in Israel, 2010. https://doi.org/10.1109/eeei.2010.5662159.
  • 52. Terven, J., Cordova-Esparza, D.M., Ramirez-Pedraza, A., Chavez-Urbiola, E.A. Loss functions and metrics in deep learning. arXiv (Cornell University), 2023. https://doi.org/10.48550/arxiv.2307.02694.
  • 53.Santos, C.F.G.D., Papa, J.P. Avoiding overfitting: A survey on regularization methods for convolutional neural networks. ACM Computing Surveys, 2022; 54(10s): 1–25. https://doi.org/10.1145/3510413.
  • 54. Alamri, N.M. Reducing the overfitting in convolutional neural network using nature-inspired algorithm: A novel hybrid approach. Arabian Journal for Science and Engineering, 2024. https://doi.org/10.1007/s13369-024-08998-4.
  • 55. Rajvanshi, S., Kaur, G., Dhatwalia, A., Arunima, N., Singla, A., Bhasin, A. Research on problems and solutions of overfitting in machine learning. In Lecture Notes in Electrical Engineering, 2024; 637–651. https://doi.org/10.1007/978-981-97-2508-3_47.
  • 56. Gupta, S., Gupta, R., Ojha, M., Singh, K.P. A comparative analysis of various regularization techniques to solve overfitting problem in artificial neural networks. In Communications in Computer and Information Science, 2018; 363–371. https://doi.org/10.1007/978-981-10-8527-7_30.
  • 57. Thakkar, A., Lohiya, R. Analyzing fusion of regularization techniques in the deep learning-based intrusion detection system. International Journal of Intelligent Systems, 2021; 36(12): 7340–7388. https://doi.org/10.1002/int.22590.
  • 58. Santos, M.S., Soares, J.P., Abreu, P.H., Araujo, H., Santos, J. Cross-validation for imbalanced datasets: Avoiding overoptimistic and overfitting approaches [Research Frontier]. IEEE Computational Intelligence Magazine, 2018; 13(4): 59–76. https://doi.org/10.1109/mci.2018.2866730.
  • 59. Buda, M., Maki, A., & Mazurowski, M.A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Networks, 2018; 106: 249–259. https://doi.org/10.1016/j.neunet.2018.07.011.
  • 60. Alshomrani, S., Aljoudi, L., Arif, M. Arabic and American sign languages alphabet recognition by convolutional neural network. Advances in Science and Technology – Research Journal, 2021; 15(4): 136–148. https://doi.org/10.12913/22998624/142012.
  • 61. Zhang, J., Fazekas, G., Saitis, C. Fast diffusion GAN model for symbolic music generation controlled by emotions. arXiv (Cornell University), 2023. https://doi.org/10.48550/arxiv.2310.14040.
  • 62. Zhang, H., Xie, L., Qi, K. Implement music generation with GAN: A systematic review. 2021 International Conference on Computer Engineering and Application (ICCEA), 2012. https://doi.org/10.1109/iccea53728.2021.00075.
  • 63. Dong, H., Hsiao, W., Yang, L., Yang, Y. MUSEGAN: Multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. arXiv (Cornell University), 2017. https://doi.org/10.48550/arxiv.1709.06298.
  • 64. Khosla, C., Saini, B.S. Enhancing performance of deep learning models with different data augmentation techniques: A survey. 2020 International Conference on Intelligent Engineering and Management (ICIEM), 2020. https://doi.org/10.1109/iciem48762.2020.9160048.
  • 65. Ying, X. An overview of overfitting and its solutions. Journal of Physics. Conference Series, 2019; 1168: 022022. https://doi.org/10.1088/1742-6596/1168/2/022022.
  • 66. Seo, W., Cho, S., Teisseyre, P., Lee, J. A short survey and comparison of CNN-based music genre classification using multiple spectral features. IEEE Access, 2024; 12: 245–257. https://doi.org/10.1109/access.2023.3346883.
  • 67. Assiri, A.S., Nazir, S., Velastin, S.A. Breast tumor classification using an ensemble machine learning method. Journal of Imaging, 2020; 6(6): 39. https://doi.org/10.3390/jimaging6060039.
  • 68. Zhuang, W., Wang, C., Chai, J., Wang, Y., Shao, M., Xia, S. Music2Dance: DanceNet for music-driven dance generation. ACM Transactions on Multimedia Computing, Communications and Applications, 2022; 18(2): 1–21. https://doi.org/10.1145/3485664.
  • 69. Song, G., Wang, Z., Han, F., Ding, S., Iqbal, M.A. Music auto-tagging using deep recurrent neural networks. Neurocomputing, 2018; 292: 104–110. https://doi.org/10.1016/j.neucom.2018.02.076.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa nr POPUL/SP/0154/2024/02 w ramach programu "Społeczna odpowiedzialność nauki II" - moduł: Popularyzacja nauki (2025).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-15058a20-20ab-4239-ab3f-71e1f566aaae
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.