PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Speech emotion recognition under white noise

Autorzy
Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Speaker‘s emotional states are recognized from speech signal with Additive white Gaussian noise (AWGN). The influence of white noise on a typical emotion recogniztion system is studied. The emotion classifier is implemented with Gaussian mixture model (GMM). A Chinese speech emotion database is used for training and testing, which includes nine emotion classes (e.g. happiness, sadness, anger, surprise, fear, anxiety, hesitation, confidence and neutral state). Two speech enhancement algorithms are introduced for improved emotion classification. In the experiments, the Gaussian mixture model is trained on the clean speech data, while tested under AWGN with various signal to noise ratios (SNRs). The emotion class model and the dimension space model are both adopted for the evaluation of the emotion recognition system. Regarding the emotion class model, the nine emotion classes are classified. Considering the dimension space model, the arousal dimension and the valence dimension are classified into positive regions or negative regions. The experimental results show that the speech enhancement algorithms constantly improve the performance of our emotion recognition system under various SNRs, and the positive emotions are more likely to be miss-classified as negative emotions under white noise environment.
Rocznik
Strony
457--463
Opis fizyczny
Bibliogr. 27 poz., tab., wykr.
Twórcy
autor
  • School of Information Science and Engineering Southeast University 2# Sipailou, Nanjing 210096, Jiangsu Prov., China
autor
  • School of Information Science and Engineering Southeast University 2# Sipailou, Nanjing 210096, Jiangsu Prov., China
autor
  • School of Information Science and Engineering Southeast University 2# Sipailou, Nanjing 210096, Jiangsu Prov., China
autor
  • School of Communication Engineering Nanjing Institute of Technology 1# Hongjing Ave., Nanjing 211167, Jiangsu Prov., China
autor
  • School of Information Science and Engineering Southeast University 2# Sipailou, Nanjing 210096, Jiangsu Prov., China
Bibliografia
  • 1. Ang J., Dhillon R., Krupski A., Shriberg E., Stolcke A. (2002), Prosody-based automatic detection of annoyance and frustration in human-computer dialog, 7th International Conference on Spoken Language Processing, pp. 2037-2040, Denver, Colorado, USA.
  • 2. Ayadia M.E., Kamelb M.S., Karray F. (2010), Survey on speech emotion recognition: Features, classification schemes, and databases, Pattern Recognition, 44, 3, 572-587.
  • 3. Boll S. (1979), Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Acoustics, Speech and Signal Processing, 27, 2, 113-120.
  • 4. Cai L. (2005), Speech emotion analysis and recognition based on data fusion, Master Thesis, Department of Radio Engineering, Southeast University, China.
  • 5. Chen G., Zhao L., Zhou C. (2007), Speech Enhancement Based on Masking Properties and Short-Time Spectral Amplitude Estimation, Journal of Electronics & Information Technology, 29, 4, 863-866.
  • 6. Clavel C., Vasilescu I., Devillers L., Richard G., Ehrette T. (2008), Fear-type emotion recognition for future audio-based surveillance systems, Speech Com- munication, 50, 487-503.
  • 7. Cohen I. (2005), Relaxed statistical model for speech enhancement and a priori SNR estimation, IEEE Transactions on Speech and Audio Processing, 13, 5, 870-881.
  • 8. Gobl C., Chasaide A.N. (2003), The role of voice quality in communicating emotion, mood and attitude, Speech Communication, 40, 189-212.
  • 9. Huang C., Jin Y., Zhao Y., Yu Y., Zhao L. (2009), Speech emotion recognition based on recomposition of two-class classifiers, 3rd International Conference on Affective Computing and Intelligent Interaction and Workshops, pp. 1-3, Amsterdam, Netherlands.
  • 10. Huang C., Zhao Y., Jin Y., Yu Y., Zhao L. (2011), A Study on Feature Analysis and Recognition for Practical Speech Emotion, Journal of Electronics & Information Technology, 33, 1, 112-116.
  • 11. Johnston J.D. (1988), Transform coding of audio signals using perceptual noise criteria, IEEE Journal on Selected Areas in Communications, 6, 2, 314-323.
  • 12. Johnstone T., van Reekum C.M., Hird K., Kirsner K., Scherer K.R. (2005), Affective speech elicited with a computer game, Emotion, 5, 4, 513-518.[PubMed]
  • 13. Jones M.C., Jonson I.M. (2005), Automatic recognition of affective cues in the speech of car drivers to allow appropriate responses, Proceedings of the 17th Australia conference on Computer-Human Interaction: Citizens Online: Considerations for Today and the Future, Canberra, Australia.
  • 14. Kockmann M., Burget L., Cernocky J.H. (2011), Application of speaker- and language identification state-of-the-art techniques for emotion recognition, Speech Communication, 53, 1172-1185.
  • 15. Neiberg D., Elenius K., Laskowski K. (2006), Emotion recognition in spontaneous speech using GMMs, International Conference on Spoken Language Process, pp. 809-902, Pittsburgh, Pennsylvania, USA.
  • 16. Reynolds D.A., Rose R.C. (1995), Robust text-independent speaker identification using Gaussian mixture speaker models, IEEE Transactions on Speech Audio Process, 3, 72-83.
  • 17. Reynolds D.A. (1997), Comparison of background normalization methods for text-independent speaker verification, European Conference on Speech Communication and Technology, pp. 963-966, Rhodes, Greece.
  • 18. Schuller B., Arsic D., Wallhoff F., Rigoll G. (2006), Emotion recognition in the noise applying large acoustic feature sets, 3rd International Conference on Speech Prosody, Dresden, Germany.
  • 19. Scherer K.R. (2003), Vocal communication of emotion: A review of research paradigms, Speech Communication 40, 227-256.
  • 20. Tawari A., Trivedi M. (2010), Speech emotion analysis in noisy real-world environment, International Conference on Pattern Recognition, pp. 4605-4609, Istanbul, Turkey.
  • 21. Truong K. (2009), How does real affect affect affect recognition in speech? Ph.D. Thesis, Department of Electrical Engineering, Mathematics and Computer Science, University of Twente.
  • 22. Tsoukalas D.E., Mourjopoulos J.N., Kokkinakis G. (1997), Speech enhancement based on audible noise suppression, IEEE Transactions on Speech and Audio Processing, 5, 6, 497-514.
  • 23. Varga A., Steeneken H.J.M. (1993), Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems, Speech Communication, 12, 3, 247-251.
  • 24. Virag N. (1999), Single Channel Speech Enhancement Based on Masking Properties of the Human Auditory System, IEEE Transactions on Speech and Audio Processing, 7, 2, 126-137.
  • 25. Wöllmer M., Eyben F., Reiter S., Schuller B., Cox C., Douglas-Cowie E., Cowie R. (2008), Abandoning emotion classes - Towards continuous emotion recognition with modeling of long-range dependencies, 9th Annual Conference of the International Speech Communication Association, pp. 597-601, Brisbane, Australia.
  • 26. Zeng Z., Pantic M., Roisman G.I., Huang T. (2009), A survey of affect recognition methods: audio, visual and spontaneous expressions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31, 1, 39-58.
  • 27. Zou C., Huang C., Han D., Zhao L. (2011), Detecting practical speech emotion in a cognitive task, 20th International Conference on Computer Communications and Networks, Maui, Hawaii, USA.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-f6720ea7-5b65-4e9b-8d72-a96bd6b3f273
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.