Structured Gaussian Process Regression of Music Mood

Chapaneri, Santosh; Jayaswal, Deepak

doi:10.3233/FI-2020-1970

Artykuł - szczegóły

Tytuł artykułu

Structured Gaussian Process Regression of Music Mood

Autorzy

Chapaneri Santosh , Jayaswal Deepak

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

DOI

10.3233/FI-2020-1970

Warianty tytułu

Języki publikacji

Abstrakty

Modeling the music mood has wide applications in music categorization, retrieval, and recommendation systems; however, it is challenging to computationally model the affective content of music due to its subjective nature. In this work, a structured regression framework is proposed to model the valence and arousal mood dimensions of music using a single regression model at a linear computational cost. To tackle the subjectivity phenomena, a confidence-interval based estimated consensus is computed by modeling the behavior of various annotators (e.g. biased, adversarial) and is shown to perform better than using the average annotation values. For a compact feature representation of music clips, variational Bayesian inference is used to learn the Gaussian mixture model representation of acoustic features and chord-related features are used to improve the valence estimation by probing the chord progressions between chroma frames. The dimensionality of features is further reduced using an adaptive version of kernel PCA. Using an efficient implementation of twin Gaussian process for structured regression, the proposed work achieves a significant improvement in R² for arousal and valence dimensions relative to state-of-the-art techniques on two benchmark datasets for music mood estimation.

Słowa kluczowe

music mood structured regression crowdsourced annotations

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2020

Tom

Vol. 176, nr 2

Strony

183--203

Opis fizyczny

Bibliogr. 42 poz., rys., tab., wykr.

Twórcy

autor

Chapaneri Santosh

santoshchapaneri@sfit.ac.in

Dept. of Electronics and Telecommunication Engineering, St. Francis Institute of Technology, University of Mumbai, India

autor

Jayaswal Deepak

djjayaswal@sfit.ac.in

Dept. of Electronics and Telecommunication Engineering, St. Francis Institute of Technology, University of Mumbai, India

Bibliografia

[1] Brinker Bd, Dinther Rv, Skowronek J. Expressed music mood classification compared with valence and arousal ratings. EURASIP Journal on Audio, Speech, and Music Processing, 2012. 2012(1):24. URL https://doi.org/10.1186/1687-4722-2012-24.
[2] Hu X. Music and mood: Where theory and reality meet. In: Proc. of the 5th iConference. 2010 URL http://hdl.handle.net/2142/14956.
[3] Chen YA, Yang YH, Wang JC, Chen H. The AMG1608 dataset for music emotion recognition. In: Proc. of the IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015 pp. 693-697. URL https://doi.org/10.1109/ICASSP.2015.7178058.
[4] Chin YH, Jia-Ching W, Wang JC, Yang YH. Predicting the probability density function of music emotion using emotion space mapping. IEEE Trans. on Affective Computing, 2018. 9(4):541-549. URL https://doi.org/10.1109/TAFFC.2016.2628794.
[5] Kumar N, Guha T, Huang CW, Vaz C, Narayanan SS. Novel affective features for multiscale prediction of emotion in music. In: Proc. of the IEEE Intl. Workshop on Multimedia Signal Processing (MMSP). IEEE, 2016 pp. 1-5. URL https://doi.org/10.1109/MMSP.2016.7813377.
[6] Bo L, Sminchisescu C. Twin Gaussian processes for structured prediction. Intl. Jour. of Computer Vision, 2009. 87(1):28. URL https://doi.org/10.1007/s11263-008-0204-y.
[7] Chapaneri S, Jayaswal D. Structured prediction of music mood with twin Gaussian processes. In: Proc. of the Intl. Conf. on Pattern Recognition and Machine Intelligence (PReMI). Springer, 2017 pp. 647-654. URL https://doi.org/10.1007/978-3-319-69900-4_82.
[8] Fukuyama S, Goto M. Music emotion recognition with adaptive aggregation of Gaussian process regressors. In: Proc. of the IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016 pp. 71-75. URL https://doi.org/10.1109/ICASSP.2016.7471639.
[9] Wang JC, Yang YH, Wang HM, Jeng SK. Modeling the affective content of music with a Gaussian mixture model. IEEE Trans. on Affective Computing, 2015. 6(1):56-68. URL https://doi.org/10.1109/TAFFC.2015.2397457.
[10] Wang JC, Wang HM, Lanckriet G. A histogram density modeling approach to music emotion recognition. In: Proc. of the IEEE Intl. Conf. on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2015 pp.698-702. URL https://doi.org/10.1109/ICASSP.2015.7178059.
[11] Liu Y, Liu Y, Zhao Y, Hua KA. What strikes the strings of your heart?-Feature mining for music emotion analysis. IEEE Trans. on Affective Computing, 2015. 6(3):247-260. URL https://doi.org/10.1109/TAFFC.2015.2396151.
[12] Panda R, Malheiro RM, Paiva RP. Novel audio features for music emotion recognition. IEEE Trans. on Affective Computing, 2018. (1):1-1. URL http://doi.ieeecomputersociety.org/10.1109/TAFFC.2018.2820691.
[13] Yang YH, Chen HH. Machine recognition of music emotion: A review. ACM Trans. on Intelligent Systems and Technology (TIST), 2012. 3(3):40. URL https://doi.org/10.1145/2168752.2168754.
[14] Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L. Learning from crowds. Journal of Machine Learning Research, 2010. 11(Apr):1297-1322. URL http://www.jmlr.org/papers/v11/raykar10a.html.
[15] Raykar VC, Yu S. Eliminating spammers and ranking annotators for crowdsourced labeling tasks. Journal of Machine Learning Research, 2012. 13(Feb):491-518. URL http://www.jmlr.org/papers/v13/raykar12a.html.
[16] Chatterjee S, Mukhopadhyay A, Bhattacharyya M. A review of judgment analysis algorithms for crowdsourced opinions. IEEE Transactions on Knowledge and Data Engineering, 2019. URL https://doi.org/10.1109/TKDE.2019.2904064.
[17] Li Y, Gao J, Meng C, Li Q, Su L, Zhao B, Fan W, Han J. A survey on truth discovery. ACM SIGKDD Explorations Newsletter, 2016. 17(2):1-16. URL https://www.kdd.org/exploration_files/Article1_17_2.pdf.
[18] Wan M, Chen X, Kaplan L, Han J, Gao J, Zhao B. From truth discovery to trustworthy opinion discovery: An uncertainty-aware quantitative modeling approach. In: Proc. of the ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining. ACM, 2016 pp. 1885-1894. URL https://doi.org/10.1145/2939672.2939837.
[19] Ramakrishna A, Gupta R, Grossman RB, Narayanan SS. An Expectation Maximization approach to joint modeling of multidimensional ratings derived from multiple annotators. In: InterSpeech. 2016 pp. 1555-1559. URL http://dx.doi.org/10.21437/Interspeech.2016-270.
[20] Xiao H, Xiao H, Eckert C. Learning from multiple observers with unknown expertise. In: Proc. of the Pacific-Asia Conf. on Knowledge Discovery and Data Mining (PAKDD), volume 7818. Springer, 2013 pp.595-606. URL https://doi.org/10.1007/978-3-642-37453-1_49.
[21] Markov K, Matsui T. Music genre and emotion recognition using Gaussian processes. IEEE Access, 2014. 2:688-697. URL https://doi.org/10.1109/ACCESS.2014.2333095.
[22] Zhang JL, Huang XL, Yang LF, Xu Y, Sun ST. Feature selection and feature learning in arousal dimension of music emotion by using shrinkage methods. Multimedia Systems, 2017. 23(2):251-264. URL https://doi.org/10.1007/s00530-015-0489-y.
[23] Hu X, Yang YH. Cross-dataset and cross-cultural music mood prediction: A case on Western and Chinese pop songs. IEEE Trans. on Affective Computing, 2017. 8(2):228-240. URL https://doi.org/10.1109/TAFFC.2016.2523503.
[24] Liu T, Han L, Ma L, Guo D. Audio-based deep music emotion recognition. In: AIP Conference Proceedings, volume 1967. 2018 p. 040021. URL https://doi.org/10.1063/1.5039095.
[25] Tripathi S, Acharya S, Sharma RD, Mittal S, Bhattacharya S. Using deep and convolutional neural networks for accurate emotion classification on DEAP dataset. In: Proc. of Innovative Applications of Artificial Intelligence. 2017 pp. 4746-4752. URL https://aaai.org/ocs/index.php/IAAI/IAAI17/paper/view/15007/13731.
[26] Ni Y, McVicar M, Santos-Rodriguez R, De Bie T. An end-to-end machine learning system for harmonic analysis of music. IEEE Trans. on Audio, Speech and Language Processing, 2012. 20(6):1771-1783. URL https://doi.org/10.1109/TASL.2012.2188516.
[27] Cheng HT, Yang YH, Lin YC, Liao IB, Chen HH, et al. Automatic chord recognition for music classification and retrieval. In: Proc. of the IEEE Intl. Conf. on Multimedia and Expo (ICME). IEEE, 2008 pp. 1505-1508. URL http://doi.ieeecomputersociety.org/10.1109/ICME.2008.4607732.
[28] Yu Y, Zimmermann R, Wang Y, Oria V. Scalable content-based music retrieval using chord progression histogram and tree-structure LSH. IEEE Trans. on Multimedia, 2013. 15(8):1969-1981. URL https://doi.org/10.1109/TMM.2013.2269313.
[29] Williams CK, Rasmussen CE. Gaussian processes for machine learning. The MIT Press, 2006. URL http://www.gaussianprocess.org/gpml/.
[30] Elhoseiny M, Elgammal A. Generalized twin Gaussian processes using Sharma-Mittal divergence. Machine Learning, 2015. 100(2-3):399-424. URL https://doi.org/10.1007/s10994-015-5497-9.
[31] Yamada M, Sigal L, Chang Y. Domain adaptation for structured regression. Intl. Jour. of Computer Vision, 2014. 109(1-2):126-145. URL https://doi.org/10.1007/s11263-013-0689-x.
[32] Aljanaki A, Yang YH, Soleymani M. Developing a benchmark for emotional analysis of music. PloS one, 2017. 12(3):e0173392. URL https://doi.org/10.1371/journal.pone.0173392.
[33] Rousseeuw PJ, Driessen KV. A fast algorithm for the minimum covariance determinant estimator. Technometrics, 1999. 41(3):212-223. URL https://www.tandfonline.com/doi/abs/10.1080/00401706.1999.10485670.
[34] Pasternack J, Roth D. Knowing what to believe (when you already know something). In: Proceedings of the 23rd International Conference on Computational Linguistics. Association for Computational Linguistics, 2010 pp. 877-885. URL https://dl.acm.org/citation.cfm?id=1873880.
[35] Lartillot O, Toiviainen P, Eerola T. A Matlab toolbox for music information retrieval. In: Data analysis, machine learning and applications: Studies in classification, data analysis, and knowledge organization. Springer, 2008 pp. 261-268. URL https://doi.org/10.1007/978-3-540-78246-9_31.
[36] Bishop CM. Pattern recognition and machine learning. Springer, 2006. URL https://www.springer.com/in/book/9780387310732.
[37] Müller M. Fundamentals of music processing: Audio, analysis, algorithms, applications. Springer, 2015. URL www.music-processing.de.
[38] Cho YH, Lim H, Kim DW, Lee IK. Music emotion recognition using chord progressions. In: Proc. of the IEEE Intl. Conf. on Systems, Man, and Cybernetics (SMC). IEEE, 2016 pp. 2588-2593. URL https://doi.org/10.1109/SMC.2016.7844628.
[39] Ellis DP, Weller AV. The 2010 LABROSA chord recognition system. 2010. URL https://doi.org/10.7916/D8TT5193.
[40] Harte C, Sandler M. Automatic chord identification using a quantised chromagram. In: Proc. of the 118th Audio Engineering Society Convention. AES, 2005 URL http://www.aes.org/e-lib/browse.cfm?elib=13128.
[41] Zhang D, Zhou ZH, Chen S. Adaptive kernel Principal Component Analysis with unsupervised learning of kernels. In: Proc. of the IEEE Intl. Conf. on Data Mining (ICDM). IEEE, 2006 pp. 1178-1182. URL https://doi.org/10.1109/ICDM.2006.14.
[42] Han J, Pei J, Kamber M. Data mining: Concepts and techniques. Elsevier, 2011. URL https://www.sciencedirect.com/book/9780123814791/data-mining-concepts-and-techniques.

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-5987c145-675c-40af-94b3-69b21bd3779d