Tytuł artykułu
Treść / Zawartość
Pełne teksty:
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
This work aims to develop a deep model for automatically labeling music tracks in terms of induced emotions. The machine learning architecture consists of two components: one dedicated to lyric processing based on Natural Language Processing (NLP) and another devoted to music processing. These two components are combined at the decision-making level. To achieve this, a range of neural networks are explored for the task of emotion extraction from both lyrics and music. For lyric classification, three architectures are compared, i.e., a 4-layer neural network, FastText, and a transformer-based approach. For music classification, the architectures investigated include InceptionV3, a collection of models from the ResNet family, and a joint architecture combining Inception and ResNet. SVM serves as a baseline in both threads. The study explores three datasets of songs accompanied by lyrics, with MoodyLyrics4Q selected and preprocessed for model training. The bimodal approach, incorporating both lyrics and audio modules, achieves a classification accuracy of 60.7% in identifying emotions evoked by music pieces. The MoodyLyrics4Q dataset used in this study encompasses musical pieces spanning diverse genres, including rock, jazz, electronic, pop, blues, and country. The algorithms demonstrate reliable performance across the dataset, highlighting their robustness in handling a wide variety of musical styles.
Słowa kluczowe
Wydawca
Rocznik
Tom
Strony
215--238
Opis fizyczny
Bibliogr. 63 poz., rys.
Twórcy
autor
- Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, 11/12 Narutowicza St., 80-233 Gdańsk, Poland
autor
- Faculty of Electronics, Telecommunications and Informatics, Gdańsk University of Technology, 11/12 Narutowicza St., 80-233 Gdańsk, Poland
autor
- Department of Computer Science, Universitat Politecnica de Catalunya, ` Jordi Girona Salgado, 1-3, 08034 Barcelona, Spain
autor
- Audio Acoustics Laboratory, Faculty of Electronics, Telecommunications, and Informatics, Gdańsk University of Technology, 11/12 Narutowicza St., 80-233 Gdańsk, Poland
Bibliografia
- [1] L. Smietanka and T. Maka, “Interpreting convolutional layers in DNN model based on time–frequency representation of emotional speech,” Journal of Artificial Intelligence and Soft Computing Research, vol. 14, no. 1, pp. 5–23, Jan. 2024, doi: 10.2478/jaiscr-2024-0001.
- [2] S. Sheykhivand, Z. Mousavi, T. Y. Rezaii, and A. Farzamnia, “Recognizing Emotions Evoked by Music Using CNN-LSTM Networks on EEG Signals,” IEEE Access, vol. 8, pp. 139332-139345, 2020, doi: 10.1109/ACCESS.2020.3011882.
- [3] Y. Takahashi, T. Hochin, and H. Nomiya, “Relationship between Mental States with Strong Emotion Aroused by Music Pieces and Their Feature Values,” in Proc. 2014 IIAI 3rd International Conference on Advanced Applied Informatics, 2014, pp. 718-725, doi: 10.1109/IIAIAAI.2014.147.
- [4] P. A. Wood and S. K. Semwal, “On exploring the connection between music classification and evoking emotion,” in Proc. 2015 International Conference on Collaboration Technologies and Systems(CTS), 2015, pp. 474-476, doi: 10.1109/CTS.2015.7210471.
- [5] M. Agapaki, E. A. Pinkerton, and E. Papatzikis, “Music and neuroscience research for mental health, cognition, and development: Ways forward,” Frontiers in Psychology, vol. 13, 2022, doi: https://doi.org/10.3389/fpsyg.2022.976883.
- [6] Y. Song, S. Dixon, M. Pearce, and A. Halpern, “Perceived and Induced Emotion Responses to Popular Music: Categorical and Dimensional Models,” Music Perception: An Interdisciplinary Journal, vol. 33, pp. 472-492, Apr. 2016, doi: 10.1525/mp.2016.33.4.472.
- [7] Y. Yuan, “Emotion of Music: Extraction and Composing,” Journal of Education, Humanities and Social Sciences, vol. 13, pp. 422-428, May 2023, doi: 10.54097/ehss.v13i.8207.
- [8] S. A. Sujeesha, J. B. Mala, and R. Rajeev, “Automatic music mood classification using multi-modal attention framework,” *Engineering Applications of Artificial Intelligence*, vol. 128, p. 107355, 2024, doi: 10.1016/j.engappai.2023.107355.
- [9] M. Schedl, P. Knees, B. McFee, D. Bogdanov, and M. Kaminskas, “Music recommender systems,” in Recommender systems handbook, Springer, 2015, pp. 453-492.
- [10] MorphCast Technology. Available: https://www.morphcast.com. Accessed: November 2024.
- [11] S. Zhao, G. Jia, J. Yang, G. Ding, and K. Keutzer, “Emotion Recognition From Multiple Modalities: Fundamentals and methodologies,” IEEE Signal Processing Magazine, vol. 38, no. 6, pp. 59-73, Nov. 2021, doi: 10.1109/msp.2021.3106895.
- [12] T. Li, “Music emotion recognition using deep convolutional neural networks,” Journal of Computational Methods in Science and Engineering, vol. 24, no. 4-5, pp. 3063-3078, 2024, doi: 10.3233/JCM-247551.
- [13] P. L. Louro, H. Redinho, R. Malheiro, R. P. Paiva, and R. Panda, “A comparison study of deep learning methodologies for music emotion recognition,” Sensors, vol. 24, no. 7, p. 2201, 2024, doi: 10.3390/s24072201.
- [14] M. Blaszke, G. Korvel, and B. Kostek, “Exploring neural networks for musical instrument identification in polyphonic audio,” IEEE Intelligent Systems, pp. 1-11, 2024, doi: 10.1109/mis.2024.3392586.
- [15] M. Barata and P. Coelho, “Music Streaming Services: Understanding the drivers of customer purchase and intention to recommend,” Heliyon, vol. 7, p. e07783, Aug. 2021, doi: 10.1016/j.heliyon.2021.e07783.
- [16] J. Webster, “The promise of personalization: Exploring how music streaming platforms are shaping the performance of class identities and distinction,” New Media & Society, p. 146144482110278, Jul. 2021, doi: 10.1177/14614448211027863.
- [17] E. Schmidt, D. Turnbull, and Y. Kim, “Feature selection for content-based, time-varying musical emotion regression,” in Proc ACM SIGMM Int Conf Multimedia Info Retrieval, Mar. 2010, pp. 267-274, doi: 10.1145/1743384.1743431.
- [18] Y.-H. Yang, Y.-C. Lin, H.-T. Cheng, I.-B. Liao, Y.-C. Ho, and H. H. Chen, “Toward Multimodal Music Emotion Classification,” in Advances in Multimedia Information Processing - PCM 2008, 2008, pp. 70-79.
- [19] T. Ciborowski, S. Reginis, D. Weber, A. Kurowski, and B. Kostek, “Classifying Emotions in Film Music—A Deep Learning Approach,” Electronics, vol. 10, no. 23, p. 2955, Nov. 2021, doi: 10.3390/electronics10232955.
- [20] X. Han, F. Chen, and J. Ban, “Music Emotion Recognition Based on a Neural Network with an Inception-GRU Residual Structure,” Electronics, vol. 12, no. 4, p. 978, Feb. 2023, doi: 10.3390/electronics12040978.
- [21] Y. J. Liao, W. C. Wang, S.-J. Ruan, Y. H. Lee, and S. C. Chen, “A Music Playback Algorithm Based on Residual-Inception Blocks for Music Emotion Classification and Physiological Information,” Sensors, vol. 22, no. 3, p. 777, Jan. 2022, doi: 10.3390/s22030777.
- [22] R. Sarkar, S. Choudhury, S. Dutta, A. Roy, and S. K. Saha, “Recognition of emotion in music based on deep convolutional neural network,” Multimedia Tools and Applications, vol. 79, pp. 765-783, 2019, [Online]. Available: https://api.semanticscholar.org/CorpusID:254866914.
- [23] S. Giammusso, M. Guerriero, P. Lisena, E. Palumbo, and R. Troncy, “Predicting the emotion of playlists using track lyrics,” International Society for Music Information Retrieval ISMIR, Late Breaking Session, 2017.
- [24] Y. Agrawal, R. Shanker, and V. Alluri, “Transformer-based approach towards music emotion recognition from lyrics,” Advances in Information Retrieval. ECIR 2021. Lecture Notes in Computer Science, vol 12657. Springer, 2021, doi: 10.1007/978-3-030-72240-1 12.
- [25] D. Han, Y. Kong, H. Jiayi, and G. Wang, “A survey of music emotion recognition,” Frontiers of Computer Science, vol. 16, Dec. 2022, doi: 10.1007/s11704-021-0569-4.
- [26] T. Baltrušaitis, C. Ahuja, and L. -P. Morency, “Multimodal Machine Learning: A Survey and Taxonomy,” in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 41, no. 2, pp. 423-443, 1 Feb. 2019, doi: 10.1109/TPAMI.2018.2798607.
- [27] R. Delbouys, R. Hennequin, F. Piccoli, J. Royo-Letelier, and M. Moussallam, “Music Mood Detection Based On Audio And Lyrics With Deep Neural Net,” ISMIR 2018 https://doi.org/10.48550/arXiv.1809.07276
- [28] I. A. P. Santana et al., “Music4all: A new music database and its applications,” in Proc. 2020 International Conference on Systems, Signals and Image Processing (IWSSIP), 2020, pp. 399-404, doi: 10.1109/IWSSIP48289.2020.9145170.
- [29] E. Çano and M. Morisio, “Moodylyrics: A sentiment annotated lyrics dataset,” in Proc. 2017 International conference on intelligent systems, meta-heuristics & swarm intelligence, 2017, pp. 118-124, doi: 10.1145/3059336.3059340.
- [30] E. Çano and M. Morisio, “Music mood dataset creation based on last.Fm tags,” in Proc. 2017 International Conference on Artificial Intelligence and Applications, Vienna, Austria, 2017, pp. 15-26, DOI:10.5121/csit.2017.70603.
- [31] R.E. Thayer: The Biopsychology of Mood and Arousal, Oxford University Press, 1989.
- [32] J. Russell, “A Circumplex Model of Affect,” Journal of Personality and Social Psychology, vol. 39, pp. 1161-1178, Dec. 1980, doi: 10.1037/h0077714.
- [33] Social music service - Last.fm. Available: https://www.last.fm/. Accessed: November 2024.
- [34] Genius - Song Lyrics & Knowledge. Available: https://genius.com/. Accessed: November 2024.
- [35] YouTube. Available: https://www.youtube.com. Accessed: November 2024.
- [36] M. Sakowicz and J. Tobolewski, “Development and study of an algorithm for the automatic labeling of musical pieces in the context of emotion evoked,” M.Sc. thesis, Gdansk University of Technology and Universitat Politècnica de Catalunya (co-supervised by B. Kostek and J. Turmo), 2023.
- [37] Genius and Spotify partnering. Available: https://genius.com/a/genius-and-spotify-together. Accessed: November 2024.
- [38] Pafy library. Available: https://pypi.org/project/pafy/. Accessed: November 2024.
- [39] Moviepy library. Available: https://pypi.org/project/moviepy/. Accessed: November 2024.
- [40] M. Honnibal and I. Montani, “spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing,” 2017. Available: https://github.com/explosion/spaCy. Accessed: November 2024.
- [41] P. N. Johnson-Laird and K. Oatley, “Emotions, Simulation, and Abstract Art,” Art & Perception, vol. 9, no. 3, pp. 260-292, 2021, DOI: https://doi.org/10.1163/22134913-bja10029.
- [42] P. N. Johnson-Laird and K. Oatley, “How poetry evokes emotions,” Acta Psycho-logica, vol. 224, p. 103506, 2022, doi: https://doi.org/10.1016/j.actpsy.2022.103506.
- [43] J. Pennington, R. Socher, and C. Manning, “GloVe: Global Vectors for Word Representation,” in Proc. 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Oct. 2014, pp. 1532-1543, doi: 10.3115/v1/D14-1162.
- [44] SpaCy - pre-trained pipeline for English. Available: https://spacy.io/models/en\#en_core_web_lg. Accessed: November 2024.
- [45] S. Loria, “Textblob Documentation,” Release 0.15, vol. 2, 2018. Available: https://textblob.readthedocs.io/en/dev/. Accessed: November 2024.
- [46] F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, no. 85, pp. 2825-2830, 2011. Available: http://jmlr.org/papers/v12/pedregosa11a.html. Accessed: November 2024.
- [47] ”Paradise City” Guns N’ Roses https://genius.com/Guns-n-roses-paradise-citylyrics
- [48] FastText - text classification tutorial. Available: https://fasttext.cc/docs/en/supervisedtutorial.html. Accessed: November 2024.
- [49] T. Wolf et al., “Transformers: State-of-the-Art Natural Language Processing,” Jan. 2020, pp. 38-45, doi: 10.18653/v1/2020.emnlp-demos.6.
- [50] XLNet (base-sized model). Available: https://huggingface.co/xlnet-base-cased. Accessed: November 2024.
- [51] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le, “Xlnet: Generalized autoregressive pretraining for language understanding,” Advances in neural information processing systems, vol. 32, 2019. https://doi.org/10.48550/arXiv.1906.08237
- [52] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” in Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016, pp. 2818-2826 doi: 10.1109/CVPR.2016.308.
- [53] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proc. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 2016, pp. 770-778, doi: 10.1109/CVPR.2016.90.
- [54] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” in Proc. 3rd International Conference on Learning Representations(ICLR 2015), 2015, pp. 1-14. https://doi.org/10.48550/arXiv.1409.1556
- [55] Librosa library. Available: https://librosa.org/. Accessed: November 2024.
- [56] Chollet, F. et al., 2015. Keras. Available: https://github.com/fchollet/keras. Accessed: November 2024.
- [57] TensorFlow library. Available: https://www.tensorflow.org/?hl=pl. Accessed: November 2024.
- [58] S. C. Huang, A. Pareek, S. Seyyedi, I. Banerjee, and M. Lungren, “Fusion of medical imaging and electronic health records using deep learning: a systematic review and implementation guidelines,” npj Digital Medicine, vol. 3, 12, 2020. https://doi.org/10.1038/s41746-020-00341-z
- [59] A. Paszke et al., 2019. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems, 32. Curran Associates, Inc., pp. 8024-8035.
- [60] Combining two deep learning models. Available: https://control.com/technical-articles/combining-two-deep-learning-models/. Accessed: November 2024.
- [61] Y. Shi, A. Karatzoglou, L. Baltrunas, M. Larson, A.Hanjalic, and N. Oliver, “TFMAP: Optimizing MAP for top-n context-aware recommendation,” in Proc. 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 155-164, Portland Oregon USA, August 2012, doi: 10.1145/2348283.2348308.
- [62] K. Pyrovolakis, P.K. Tzouveli, and G. Stamou, Multi-Modal Song Mood Detection with Deep Learning. Sensors (Basel, Switzerland), 22, 2022, doi:10.3390/s22031065
- [63] E. N. Shaday, V. J. L. Engel, and H. Heryanto, “Application of the Bidirectional Long Short-Term Memory Method with Comparison of Word2Vec, GloVe, and FastText for Emotion Classification in Song Lyrics”, Procedia Computer Science, vol. 245, pp. 137-146, 2024, https://doi.org/10.1016/j.procs.2024.10.237
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa nr POPUL/SP/0154/2024/02 w ramach programu "Społeczna odpowiedzialność nauki II" - moduł: Popularyzacja nauki (2025).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-10e80e43-0cd7-446c-9d68-f93e999658c0
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.