Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
Music Structure Analysis (MSA) is crucial for understanding and leveraging the arrangement of musical compositions in various applications, such as music information retrieval, multimedia description, and recommendation systems. The following paper presents a novel approach to MSA that aims to predict labels for structural music segments (such as verse or chorus), thereby it would enhance any MSA-based applications. This is the supervised approach in contrast to clustering-based methods. For the task, selected pre-trained Convolutional Neural Networks (CNNs), such as VGG, ResNet or MobileNet were applied to classify the segments of musical structures (verse, chorus, etc.). Results demonstrated that ResNet50 and DenseNet121 achieved the highest performance in terms of classification accuracy, with ResNet50 reaching 87% and DenseNet121 reaching 85.16%. This highlights the potential of deep learning models for accurate and efficient music structure segment labeling, opening possibilities for advanced applications in both offline and real-time music analysis scenarios.
Słowa kluczowe
Rocznik
Tom
Strony
1
Opis fizyczny
Bibliogr. 42 poz., rys., tab.
Twórcy
autor
- Warsaw University of Technology
autor
- Warsaw University of Technology
Bibliografia
- [1] Y. Chen, "A music recommendation system based on collaborative filtering and SVD," in 2022 IEEE Conference on Telecommunications, Optics and Computer Science (TOCS), Dalian, China, 2022, pp. 1510-1513. https://doi.org/10.1109/TOCS56154.2022.10016210
- [2] D. Sánchez-Moreno, A. B. Gil González, M. D. Muñoz Vicente, V. F. López Batista, and M. N. Moreno García, "A collaborative filtering method for music recommendation using playing coefficients for artists and users," Expert Systems with Applications, vol. 66, pp. 234-244, 2016. https://doi.org/10.1016/j.eswa.2016.09.019
- [3] R.B. Dannenberg and M. Goto, "Music structure analysis from acoustic signals," in Handbook of Signal Processing in Acoustics, Springer, 2008, pp. 305-331. https://doi.org/10.1007/978-0-387-30441-0_21
- [4] R. Zhang, Q. Liu, C. Chun-Gui, J. Wei, and Huiyi-Ma, "Collaborative filtering for recommender systems," in 2014 Second International Conference on Advanced Cloud and Big Data, Huangshan, China, 2014, pp. 301-308. https://doi.org/10.1109/CBD.2014.47
- [5] L. Colley et al., "Elucidation of the relationship between a song's Spotify descriptive metrics and its popularity on various platforms," in 2022 IEEE 46th Annual Computers, Software, and Applications Conference (COMPSAC), Los Alamitos, CA, USA, 2022, pp. 241-249. https://doi.org/10.1109/COMPSAC54236.2022.00042
- [6] O. Nieto and J. P. Bello, "Systematic Exploration Of Computational Music Structure Research," in Proc. of the 17th International Society for Music Information Retrieval Conference (ISMIR). New York City, NY, USA, 2016. ISMIR2016-NietoBello.pdf
- [7] Salami Annotator Guide, [Online]. https://github.com/DDMAL/salami-data-public/blob/master/SALAMI%20Annotator%20Guide.pdf
- [8] O. Nieto et al., "Audio-based music structure analysis: Current trends, open challenges, and applications," Transactions of the International Society for Music Information Retrieval, vol. 3, no. 1, pp. 246-263, 2020. https://doi.org/10.5334/tismir.54
- [9] K. Jensen, "Multiple scale music segmentation using rhythm, timbre, and harmony," EURASIP Journal on Advances in Signal Processing, vol. 2007. https://doi.org/10.1155/2007/73205
- [10] M. C. McCallum, "Unsupervised learning of deep features for music segmentation," in ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 2019, pp. 346-350. https://doi.org/10.1109/ICASSP.2019.8683407
- [11] J. Paulus, M. Müller, and A. Klapuri, "Audio-based music structure analysis," in Proc of the 11th International Society for Music Information Retrieval Conference, ISMIR 2010. pp. 625-636. https://ismir2010.ismir.net/proceedings/ismir2010-107.pdf
- [12] X. Li, R. Liu, and M. Li, "A review on objective music structure analysis," in 2009 International Conference on Information and Multimedia Technology, Jeju, Korea (South), 2009, pp. 226-229. https://doi.org/10.1109/ICIMT.2009.20
- [13] J. Foote, "Automatic audio segmentation using a measure of audio novelty," 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532), New York, NY, USA, 2000, pp. 452-455 vol.1. https://doi.org/10.1109/ICME.2000.869637
- [14] L. Lu, M. Wang, and H. Zhang, "Repeating pattern discovery and structure analysis from acoustic music data," In Proc. of the 6th ACM SIGMM international workshop on Multimedia information retrieval (MIR '04). Association for Computing Machinery, New York, NY, USA, pp. 275-282. https://doi.org/10.1145/1026711.1026756
- [15] Y. Shiu, H. Jeong, and C. J. Kuo, "Musical structure analysis using similarity matrix and dynamic programming," in Proc. SPIE 6015, Multimedia Systems and Applications VIII, 601516, 24 October 2005. https://doi.org/10.1117/12.633792
- [16] B. McFee and D. P.W. Ellis, "Analyzing song structure with spectral clustering," in Proc. of 15th International Society for Music Information Retrieval Conference, ISMIR 2014, Taipei, Taiwan, October 27-31, 2014, pp. 405-410.
- [17] J. Serra, M. Müller, P. Grosche, and J. L. Arcos, "Unsupervised music structure annotation by time series structure features and segment similarity," in IEEE Transactions on Multimedia, vol. 16, no. 5, pp. 1229-1240, Aug. 2014. https://doi.org/10.1109/TMM.2014.2310701
- [18] T. Cheng, J. B. Smith, and M. Goto, "Music structure boundary detection and labelling by a deconvolution of path-enhanced self-similarity matrix," in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 2018, pp. 106-110. https://doi.org/10.1109/ICASSP.2018.8461319
- [19] M. Sandler and J. J. Aucouturier, "Segmentation of musical signals using hidden Markov models," in Audio Engineering Society. 110, 2001.
- [20] G. Peeters, A. La Burthe, and X. Rodet, "Toward automatic music audio summary generation from signal analysis," in Proceedings International Conference on Music Information Retrieval, 2002.
- [21] J. Paulus and A. Klapuri, "Music structure analysis by finding repeated parts," in Proc. of the 1st ACM workshop on Audio and music computing multimedia (AMCMM '06). Association for Computing Machinery, New York, NY, USA, 59-68. https://doi.org/10.1145/1178723.1178733
- [22] M. Levy and M. B. Sandler "Structural segmentation of musical audio by constrained clustering," in IEEE Transactions on Audio, Speech, and Language Processing, vol. 16, no. 2, pp. 318-326, Feb. 2008. https://doi.org/10.1109/TASL.2007.910781
- [23] J. Pauwels, F. Kaiser, and G. Peeters, "Combining harmony-based and novelty-based approaches for structural segmentation," in International Society for Music Information Retrieval, 2013.
- [24] K. Ullrich, J. Schlüter, and T. Grill, "Boundary detection in music structure analysis using convolutional neural networks," in Proc.15th International Society for Music Information Retrieval Conference, 2014.
- [25] T. Grill and J. Schlüter, "Music boundary detection using neural networks on spectrograms and self-similarity lag matrices," in 2015 23rd European Signal Processing Conference (EUSIPCO), Nice, France, 2015, pp. 1296-1300. https://doi.org/10.1109/EUSIPCO.2015.7362593
- [26] T. O’Brien, "Musical structure segmentation with convolutional neural networks," in Proc. of the 17th International Society for Music Information Retrieval Conference, 2016.
- [27] M. A. Bartsch and G. H. Wakefield, "To catch a chorus: Using chroma-based representations for audio thumbnailing," in Proc. 2001 IEEE Workshop on the Applications of Signal Processing to Audio and Acoustics (Cat. No.01TH8575), New Platz, NY, USA, 2001, pp. 15-18. https://doi.org/10.1109/ASPAA.2001.969531
- [28] M. Goto, "A chorus-section detecting method for musical audio signals," in 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proc. (ICASSP '03)., Hong Kong, China, 2003, pp. V-437. https://doi.org/10.1109/ICASSP.2003.1200000
- [29] A. Eronen, "Chorus detection with combined use of MFCC and chroma features and image processing filters," in Proc. of the 10th Int. Conference on Digital Audio Effects (DAFx-07), Bordeaux, France, September 10-15, 2007.
- [30] S. Gao and H. Li, "Popular song summarization using chorus section detection from audio signal," in 2015 IEEE 17th International Workshop on Multimedia Signal Processing (MMSP), Xiamen, China, 2015, pp. 1-6. https://ieeexplore.ieee.org/document/7340798
- [31] J. C. Wang, J. B. Smith, J. Chen, X. Song, and Y. Wang, "Supervised chorus detection for popular music using convolutional neural network and multi-task learning,". https://doi.org/10.48550/arXiv.2103.14253
- [32] Q. He, X. Sun, Y. Yu, and W. Li, "Deepchorus: A hybrid model of multi-scale convolution and self-attention for chorus detection," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 411-415. https://doi.org/10.1109/ICASSP43922.2022.9746919
- [33] [G. Shibata, R. Nishikimi, and K. Yoshii, "Music structure analysis based on an LSTM-HSMM hybrid model," in Proc. of the 21st Int. Society for Music Information Retrieval Conf., Montréal, Canada, 2020.
- [34] J. Wang, Y. Hung, and J. B. L. Smith, "To catch a chorus, verse, intro, or anything else: Analyzing a song with structural functions," in ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 416-420. https://doi.org/10.1109/ICASSP43922.2022.9747252
- [35] A. Marmoret, J. E. Cohen, and F. Bimbot, "Convolutive block-matching segmentation algorithm with application to music structure analysis," 2023 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), New Paltz, NY, USA, 2023, pp. 1-5. https://doi.org/10.1109/WASPAA58266.2023.10248174
- [36] T. Kim and J. Nam, "All-in-one metrical and functional structure analysis with neighborhood attentions on demixed audio,". 10.48550/arXiv.2307.16425
- [37] J. B. L. Smith, J. A. Burgoyne, I. Fujinaga, D. De Roure, and J. S. Downie, "Design and creation of a large-scale database of structural annotations," in Proc. of the International Society for Music Information Retrieval Conference. Miami, FL. 555-60, 2011.
- [38] SALAMI Dataset, [Online]. Available: https://github.com/DDMAL/salami-data-public
- [39] B. Zhang, J. Leitner, and S. Thornton, "Audio recognition using mel spectrograms and convolution neural networks,", 2019.
- [40] E. Waisberg et al., "Transfer learning as an AI-based solution to address limited datasets in space medicine," Life Sciences in Space Research, Vol. 36, 2023, pp. 36-38. https://doi.org/10.1016/j.lssr.2022.12.002
- [41] J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and L. Fei-Fei, "ImageNet: A large-scale hierarchical image database," in 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 2009, pp. 248-255. https://doi.org/10.1109/CVPR.2009.5206848
- [42] A. A. Dina, T. S. Asmaa, T. N. Marwa, and H. J. Ali, "Classification of COVID-19 from CT chest images using convolutional wavelet neural network," International Journal of Electrical and Computer Engineering Vol. 13, No. 1, 2023. https://doi.org/10.11591/ijece.v13i1.pp1078-1085
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa nr POPUL/SP/0154/2024/02 w ramach programu "Społeczna odpowiedzialność nauki II" - moduł: Popularyzacja nauki (2025).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-8f2fa7a7-4c2d-4579-9ba2-64ac58b045c8
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.