PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Automatic speech based emotion recognition using paralinguistics features

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Affective computing studies and develops systems capable of detecting humans affects. The search for universal well-performing features for speech-based emotion recognition is ongoing. In this paper, a?small set of features with support vector machines as the classifier is evaluated on Surrey Audio-Visual Expressed Emotion database, Berlin Database of Emotional Speech, Polish Emotional Speech database and Serbian emotional speech database. It is shown that a?set of 87 features can offer results on-par with state-of-the-art, yielding 80.21, 88.6, 75.42 and 93.41% average emotion recognition rate, respectively. In addition, an experiment is conducted to explore the significance of gender in emotion recognition using random forests. Two models, trained on the first and second database, respectively, and four speakers were used to determine the effects. It is seen that the feature set used in this work performs well for both male and female speakers, yielding approximately 27% average emotion recognition in both models. In addition, the emotions for female speakers were recognized 18% of the time in the first model and 29% in the second. A?similar effect is seen with male speakers: the first model yields 36%, the second 28% a?verage emotion recognition rate. This illustrates the relationship between the constitution of training data and emotion recognition accuracy.
Rocznik
Strony
479--488
Opis fizyczny
Bibliogr. 64 poz., tab., rys.
Twórcy
autor
  • iCV Research of Tartu, Tartu 50411, Estonia
autor
autor
  • Department of Computer Engineering, Eastern Mediterranean University, Famagusta, North Cyprus, via Mersin 10, Turkey
  • iCV Research of Tartu, Tartu 50411, Estonia
  • Department of Electrical and Electronic Engineering, Hasan Kalyoncu University, Gaziantep, Turkey.
Bibliografia
  • [1] F.Noroozi, M. Marjanovic, A. Njegus, S. Escalera, and G. Anbarjafari, “Fusion of classifier predictions for audio-visual emotion recognition”, in Pattern Recognition (ICPR), 2016 23rd International Conference on. IEEE, 61–66, 2016.
  • [2] D. Kaminska, T. Sapinski, and G. Anbarjafari, “Efficiency of chosen speech descriptors in relation to emotion recognition”, EURASIP Journal on Audio, Speech, and Music Processing 2017 (1), 3 (2017).
  • [3] F. Noroozi, T. Sapinski, D. Kaminska, and G. Anbarjafari, “Vocal-based emotion recognition using random forests and decision tree”, International Journal of Speech Technology 20 (2), 239–246 (2017).
  • [4] J. Guo, Z. Lei, J. Wan, E. Avots, N. Hajarolasvadi, B. Knyazev, A. Kuharenko, J.C. Jacques, X. Baró, H. Demirel et al., “Dominant and complementary emotion recognition from still images of faces”, IEEE Access, 2018.
  • [5] R.E. Haamer, K. Kulkarni, N. Imanpour, M.A. Haque, E. Avots, M. Breisch, K. Nasrollahi, S.E. Guerrero, C. Ozcinar, X. Baro et al., “Changes in facial expression as biometric: a database and benchmarks of identification”, in IEEE Conf. on Automatic Face and Gesture Recognition Workshops. IEEE, 2018.
  • [6] A. Jaimes and N. Sebe, “Multimodal human–computer interaction: A survey”, Computer vision and image understanding, 108 (1-2), 116–134 (2007).
  • [7] N. Kamaruddin and A. Wahab, “Heterogeneous driver behavior state recognition using speech signal”, in Proceedings of the 10th WSEAS international conference on System science and simulation in Engineering, 207–212, 2011.
  • [8] A. Tawari and M. Trivedi, “Speech based emotion classification framework for driver assistance system”, in Intelligent Vehicles Symposium (IV), 2010 IEEE, 174–178, 2010.
  • [9] M. Grimm, K. Kroschel, H. Harris, C. Nass, B. Schuller, G. Rigoll, and T. Moosmayr, “On the necessity and feasibility of detecting a driver’s emotional state while driving”, in International Conference on Affective Computing and Intelligent Interaction. Springer, 126–138, 2007.
  • [10] A. Tickle, S. Raghu, and M. Elshaw, “Emotional recognition from the speech signal for a virtual education agent”, in Journal of Physics: Conference Series 450 (1), 012053, IOP Publishing, 2013.
  • [11] M. Gong and Q. Luo, “Speech emotion recognition in web based education”, in Grey Systems and Intelligent Services, 2007. GSIS 2007. IEEE International Conference on IEEE, 1082–1086, 2007.
  • [12] W. Li, Y. Zhang, and Y. Fu, “Speech emotion recognition in elearning system based on affective computing”, in Natural Computation, 2007. ICNC 2007. Third International Conference on IEEE, 5, 809–813, 2007.
  • [13] F.-M. Lee, L.-H. Li, and R.-Y. Huang, “Recognizing low/high anger in speech for call centers”, in International Conference on Signal Processing, Robotics and Automation, 171–176, 2008.
  • [14] V. Petrushin, “Emotion in speech: Recognition and application to call centers”, in Proceedings of Artificial Neural Networks in Engineering 710, 1999.
  • [15] D. Morrison, R. Wang, and L.C. De Silva, “Ensemble methods for spoken emotion recognition in call-centres”, Speech communication 49 (2), 98–112 (2007).
  • [16] M. Szwoch and W. Szwoch, “Emotion recognition for affect aware video games”, in Image Processing & Communications Challenges 6, 227–236, Springer, 2015.
  • [17] J. Torous, R. Friedman, and M. Keshavan, “Smartphone ownership and interest in mobile applications to monitor symptoms of mental health conditions”, JMIR mHealth and uHealth 2 (1), e2 (2014).
  • [18] M.S. Hossain and G. Muhammad, “Cloud-assisted speech and face recognition framework for health monitoring”, Mobile Networks and Applications 20 (3), 391–399 (2015).
  • [19] M.R. Kandalaft, N. Didehbani, D.C. Krawczyk, T.T. Allen, and S.B. Chapman, “Virtual reality social cognition training for young adults with high-functioning autism”, Journal of autism and developmental disorders 43 (1), 34–44 (2013).
  • [20] I. Lüsi and G. Anbarjafari, “Mimicking speaker’s lip movement on a 3d head model using cosine function fitting”, Bulletin of the Polish Academy of Sciences Technical Sciences 65 (5), 733–739 (2017).
  • [21] G. Anbarjafari, R.E. Haamer, I. Lusi, T. Tikk, and L. Valgma, “3D face reconstruction with region based best fit blending using mobile phone for virtual reality based social media”, Bulletin of the Polish Academy of Sciences Technical Sciences 67 (1), 125–132 (2019).
  • [22] J. Gorbova, I. Lüsi, A. Litvin, and G. Anbarjafari, “Automated screening of job candidate based on multimodal video processing”, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 29–35, 2017.
  • [23] D. Kaminska and A. Pelikant, “Recognition of human emotion from a speech signal based on plutchik’s model”, International Journal of Electronics and Telecommunications 58 (2), 165–170 (2012).
  • [24] C.-N. Anagnostopoulos, T. Iliou, and I. Giannoukos, “Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011”, Artificial Intelligence Review 43 (2), 155–177 (2015).
  • [25] T. L. Nwe, S.W. Foo, and L. C. De Silva, “Speech emotion recognition using hidden markov models”, Speech Communication 41 (4), 603–623 (2003).
  • [26] B. Schuller, G. Rigoll, and M. Lang, “Hidden markov modelbased speech emotion recognition”, in Multimedia and Expo, 2003. ICME’03. Proceedings. 2003 International Conference on IEEE, 2, 401–401 (2003).
  • [27] C. M. Lee, S. Yildirim, M. Bulut, A. Kazemzadeh, C. Busso, Z. Deng, S. Lee, and S. Narayanan, “Emotion recognition based on phoneme classes”, in Eighth International Conference on Spoken Language Processing, 2004.
  • [28] K. Rychlicki-Kicior and B. Stasiak, “Multipitch estimation using judge-based model”, Bulletin of the Polish Academy of Sciences Technical Sciences 62 (4), 751–757 (2014).
  • [29] F. Noroozi, D. Kaminska, T. Sapinski, and G. Anbarjafari, “Supervised vocal-based emotion recognition using multiclass support vector machine, random forests, and adaboost”, Journal of the Audio Engineering Society 65 (7/8), 562–572 (2017).
  • [30] B. Schuller, A. Batliner, D. Seppi, S. Steidl, T. Vogt, J. Wagner, L. Devillers, L. Vidrascu, N. Amir, L. Kessous et al., “The relevance of feature type for the automatic classification of emotional user states: Low level descriptors and functionals”, in Eighth Annual Conference of the International Speech Communication Association, 2007.
  • [31] M. Lugger and B. Yang, “An incremental analysis of different feature groups in speaker independent emotion recognition”, in 16th Int. congress of phonetic sciences, 2007.
  • [32] B. Vlasenko, B. Schuller, A.Wendemuth, and G. Rigoll, “Frame vs. turn-level: emotion recognition from speech considering static and dynamic processing”, in International Conference on Affective Computing and Intelligent Interaction, 139–147, Springer, 2007.
  • [33] C. Yang, L. Ji, and G. Liu, “Study to speech emotion recognition based on twinssvm”, in Natural Computation, 2009. ICNC’09. Fifth International Conference on IEEE, 312–316, 2009.
  • [34] Y. Jin, Y. Zhao, C. Huang, and L. Zhao, “Study on the emotion recognition of whispered speech”, in Intelligent Systems, 2009. GCIS’09. WRI Global Congress on IEEE, 3, 242–246, 2009.
  • [35] Y. Zhou, Y. Sun, L. Yang, and Y. Yan, “Applying articulatory features to speech emotion recognition”, in Research Challenges in Computer Science, 2009. ICRCCS’09. International Conference on IEEE, 73–76, 2009.
  • [36] T.-L. Pao, W.-Y. Liao, Y.-T. Chen, J.-H. Yeh, Y.-M. Cheng, and C.S. Chien, “Comparison of several classifiers for emotion recognition from noisy mandarin speech”, in Intelligent Information Hiding and Multimedia Signal Processing, 2007. IIHMSP 2007. Third International Conference on IEEE, 1, 23–26, 2007.
  • [37] X. Mao, L. Chen, and L. Fu, “Multi-level speech emotion recognition based on hmm and ann”, in Computer Science and Information Engineering, 2009 WRI World Congress on IEEE, 7, 225–229, 2009.
  • [38] A. Shaukat and K. Chen, “Towards automatic emotional state categorization from speech signals”, in Ninth Annual Conference of the International Speech Communication Association, 2008.
  • [39] A. Hassan and R.I. Damper, “Multi-class and hierarchical svms for emotion recognition”, in Proc. Interspeech, 2010.
  • [40] A. Shaukat and K. Chen, Emotional state categorization from speech: machine vs. human, arXiv preprint arXiv:1009.0108, 2010.
  • [41] V. Kobayashi and V. Calag, “Detection of affective states from speech signals using ensembles of classifiers”, in IET Intelligent Signal Processing Conference, 2013.
  • [42] B.-C. Chiou and C.-P. Chen, “Feature space dimension reduction in speech emotion recognition using support vector machine”, in Signal and information processing association annual summit and conference (APSIPA), 2013 Asia-Pacific IEEE, 1–6, 2013.
  • [43] E. Yüncü, H. Hacihabiboglu, and C. Bozsahin, “Automatic speech emotion recognition using auditory models with binary decision tree and svm”, in Pattern Recognition (ICPR), 2014 22nd International Conference on IEEE, 773–778, 2014.
  • [44] P. Boersma and D.Weenink, “Praat: doing phonetics by computer (version 6.0.36) [computer program], retrieved january 1, 2018”, 2018.
  • [45] C.-W. Hsu, C.-C. Chang, C.-J. Lin et al., A practical guide to support vector classification, 2003.
  • [46] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines”, ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), software available at http://www.csie.ntu.edu.tw/ cjlin/libsvm.
  • [47] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine learning in Python”, Journal of Machine Learning Research 12, 2825–2830 (2011).
  • [48] P. Patel, A. Chaudhari, R. Kale, and M. Pund, “Emotion recognition from speech with gaussian mixture models & via boosted gmm”, International Journal of Research in Science&Engineering, 3 (2017).
  • [49] S.R. Krothapalli and S.G. Koolagudi, “Speech emotion recognition: a review”, in Emotion Recognition using Speech Features, 15–34, Springer, 2013.
  • [50] D. Gharavian, M. Bejani, and M. Sheikhan, “Audio-visual emotion recognition using fcbf feature selection method and particle swarm optimization for fuzzy artmap neural networks”, Multimedia Tools and Applications 76 (2), 2331–2352 (2017).
  • [51] J. Deng, X. Xu, Z. Zhang, S. Frühholz, D. Grandjean, and B. Schuller, “Fisher kernels on phase-based features for speech emotion recognition”, in Dialogues with Social Robots, 195–203, Springer, 2017.
  • [52] N. Yang, J. Yuan, Y. Zhou, I. Demirkol, Z. Duan,W. Heinzelman, and M. Sturge-Apple, “Enhanced multiclass svm with thresholding fusion for speech-based emotion classification”, International Journal of Speech Technology 20 (1), 27–41 (2017).
  • [53] Z.-T. Liu, M.Wu,W.-H. Cao, J.-W. Mao, J.-P. Xu, and G.-Z. Tan, “Speech emotion recognition based on feature selection and extreme learning machine decision tree”, Neurocomputing, 2017.
  • [54] F. Noroozi, N. Akrami, and G. Anbarjafari, “Speech-based emotion recognition and next reaction prediction”, in Signal Processing and Communications Applications Conference (SIU), 2017 25th, 1–4, IEEE, 2017.
  • [55] F. Eibe, M. Hall, I. Witten, and J. Pal, “The weka workbench”, Online Appendix for Data Mining: Practical Machine Learning Tools and Techniques, 4, 2016.
  • [56] L. Breiman, “Random forests”, Machine learning 45 (1), 5–32 (2001).
  • [57] J. Gama, R. Rocha, and P. Medas, “Accurate decision trees for mining high-speed data streams”, in Proceedings of the ninth ACMSIGKDD international conference on Knowledge discovery and data mining, 523–528, ACM, 2003.
  • [58] L. Breiman, “Bias, variance, and arcing classifiers”, 1996.
  • [59] M.-L. Zhang and Z.-H. Zhou, “A review on multi-label learning algorithms”, IEEE Transactions on Knowledge and Data Engineering 26 (8), 1819–1837 (2014).
  • [60] G. Louppe, Understanding random forests: From theory to practice, arXiv preprint arXiv:1407.7502, 2014.
  • [61] P. Jackson and S. Haq, Surrey audio-visual expressed emotion( savee) database, University of Surrey: Guildford, UK, 2014.
  • [62] F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss, “A database of german emotional speech”, in Interspeech, 5, 1517–1520 (2005).
  • [63] P. Staroniewicz, “Polish emotional speech database–design”, in Proc. of 55th Open Seminar on Acoustics, Wroclaw, Poland, 373–378, 2008.
  • [64] S.T. Jovicic, Z. Kasic, M. Dordevic, and M. Rajkovic, “Serbian emotional speech database: design, processing and evaluation”, in 9th Conference Speech and Computer, 2004.
Uwagi
PL
Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-f877f8f2-1961-4ff9-8ba4-42545c30729f
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.