Analiza parametrów sygnału mowy w kontekście ich przydatności w automatycznej ocenie jakości ekspresji śpiewu

Zaporowski, Szymon; Kostek, Bożena

doi:10.32016/1.68.13

Artykuł - szczegóły

Tytuł artykułu

Analiza parametrów sygnału mowy w kontekście ich przydatności w automatycznej ocenie jakości ekspresji śpiewu

Autorzy

Zaporowski Szymon , Kostek Bożena

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.32016/1.68.13

Warianty tytułu

Analysis of the speech signal parameters in the context of their suitability in the automatic quality of singing expression assessment

Konferencja

Zastosowanie komputerów w nauce i technice 2019 (XXIX ; 2019 ; Gdańsk ; Polska)

Języki publikacji

Abstrakty

Praca dotyczy podejścia do parametryzacji w przypadku klasyfikacji emocji w śpiewie oraz porównania z klasyfikacją emocji w mowie. Do tego celu wykorzystano bazę mowy i śpiewu nacechowanego emocjonalnie RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song), zawierającą nagrania profesjonalnych aktorów prezentujących sześć różnych emocji. Następnie obliczono współczynniki mel-cepstralne (MFCC) oraz wybrane deskryptory niskopoziomowe MPEG 7. W celu selekcji cech, posiadających najlepsze wyniki rankingowe, wykorzystano las drzew. Następnie dokonano klasyfikacji emocji z za pomocą maszyny wektorów nośnych (SVM, Support Vector Machine). Stwierdzono, że parametryzacja skuteczna dla mowy nie jest skuteczna dla śpiewu. Wyznaczono podstawowe parametry, które zgodnie z otrzymanymi wynikami pozwalają na znaczną redukcję wymiarowości wektorów cech, jednocześnie podnosząc skuteczność klasyfikacji.

This paper concerns the approach to parameterization for the classification of emotions in singing and comparison with the classification of emotions in speech. For this purpose, the RAVDESS database containing emotional speech and song was used. This database contains recordings of professional actors presenting six different emotions. Next, Mel Frequency Cepstral Coefficients and selected Low-Level MPEG 7 descriptors were calculated. Using the algorithm of Feature Selection based on a Forest of Trees, coefficients, and descriptors with the best ranking results were determined. Then, the emotions were classified using the Support Vector Machine. The classification was repeated several times, and the results were averaged. It was found that descriptors used for emotion detection in speech are not as useful for singing. Basic parameters for singing were determined which, according to the obtained results, allow for a significant reduction in the dimensionality of feature vectors while increasing the classification efficiency of emotion detection.

Słowa kluczowe

niskopoziomowe deskryptory sygnału analiza śpiewu ekstrakcja parametrów śpiew emocje

Mel Frequency Cepstral Coefficient Low-Level MPEG 7 Audio Descriptor singing analysis feature selection

Wydawca

Wydział Elektrotechniki i Automatyki Politechniki Gdańskiej

Czasopismo

Zeszyty Naukowe Wydziału Elektrotechniki i Automatyki Politechniki Gdańskiej

Rocznik

2019

Tom

Nr 68

Strony

61--64

Opis fizyczny

Bibliogr. 16 poz., rys., tab.

Twórcy

autor

Zaporowski Szymon

smck@multimed.org

Katedra Systemów Multimedialnych, Wydział Elektroniki, Telekomunikacji i Informatyki, Politechnika Gdańska

autor

Kostek Bożena

bokostek@audioakustyka.org

Laboratorium Akustyki Fonicznej, Wydział Elektroniki, Telekomunikacji i Informatyki, Politechnika Gdańska

Bibliografia

1. D. Bertero and P. Fung: A first look into a Convolutional Neural Network for speech emotion detection, in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2017, 5115–5119.
2. L. Kerkeni, Y. Serrestou, K. Raoof, C. Cléder, M. Mahjoub, and M. Mbarki: Automatic Speech Emotion Recognition Using Machine Learning, 2019, p. https://www.intechopen.com/online-first/automatic.
3. K. R. Scherer, J. Sundberg, L. Tamarit, and G. L. Salomão: Comparing the acoustic expression of emotion in the speaking and the singing voice, Comput. Speech Lang., vol. 29, no. 1, 218–235, 2015.
4. N. Cibau, E. Albornoz, and H. Rufiner, Speech emotion recognition using a deep autoencoder. 2013.
5. M. C. Sezgin, B. Gunsel, and G. K. Kurt: Perceptual audio features for emotion detection, EURASIP J. Audio, Speech, Music Process., vol. 2012, no. 1, p. 16, 2012.
6. S. S. Poorna, C. Y. Jeevitha, S. J. Nair, S. Santhosh, and G. J. Nair: Emotion recognition using multiparameter speech feature classification, in 2015 International Conference on Computers, Communications, and Systems (ICCCS), 2015, 217–222.
7. P. Zwan: Expert system for automatic classification and quality assessment of singing voices, Audio Eng. Soc. - 121st Conv. Pap. 2006, vol. 1, 446–454, Jan. 2006.
8. N. Amir, O. Michaeli, and O. Amir: Acoustic and perceptual assessment of vibrato quality of singing students, BIOMED SIGNAL Process Control, vol. 1, 144–150, Apr. 2006.
9. E. Półrolniczak and M. Łazoryszczak: Quality assessment of intonation of choir singers using F0 and trend lines for singing sequence, Metod. Inform. Stosow., vol. no. 4, 259–268, 2011.
10. S. R. Livingstone and F. A. Russo, The ryerson audio-visual database of emotional speech and song (ravdess): A dynamic, multimodal set of facial and vocal expressions in north American english, vol. 13, no. 5. 2018.
11. B. McFee et al.: librosa/librosa: 2019.
12. S. Zaporowski and A. Czyżewski: Selection of Features for Multimodal Vocalic Segments Classification BT - Multimedia and Network Information Systems, 2019, 490–500.
13. P. Geurts, D. Ernst, and L. Wehenkel: Extremely randomized trees, Mach. Learn., vol. 63, no. 1, 3–42, 2006.
14. G. Louppe, L. Wehenkel, A. Sutera, and P. Geurts: Understanding variable importances in forests of randomized trees, Advances in Neural Information Processing Systems 26 (C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, Eds.) Curran Associates, Inc., 2013, 431–439.
15. V. Svetnik, A. Liaw, C. Tong, J. C. Culberson, R. P. Sheridan, and B. P. Feuston: Random Forest: A Classification and Regression Tool for Compound Classification and QSAR Modeling, J. Chem. Inf. Comput. Sci., vol. 43, no. 6, 1947–1958, Nov. 2003.
16. F. Pedregosa et al.: Scikit-learn: Machine Learning in {P}ython, J. Mach. Learn. Res., vol. 12, 2825–2830, 2011.

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-639906be-f565-473d-9078-59667c14e2eb