PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Application of Teager Energy Operator on Linear and Mel Scales for Whispered Speech Recognition

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
This paper presents experimental results on whispered speech recognition based on Teager Energy Operator for linear and mel cepstral coefficients including the Cepstral Mean Subtraction normalization technique. The feature vectors taken into consideration are Linear Frequency Cepstral Coefficients, Teager Energy based Linear Frequency Cepstral Coefficients, Mel Frequency Cepstral Coefficients and Teager Energy based Mel Frequency Cepstral Coefficients. A speaker dependent scenario is used. For the recognition process, Dynamic Time Warping and Hidden Markov Models methods are applied. Results show a respectable improvement in whispered speech recognition as achieved by using the Teager Energy Operator with Cepstral Mean Subtraction.
Rocznik
Strony
3--9
Opis fizyczny
Bibliogr. 24 poz., rys., tab., wykr.
Twórcy
  • Department of Acoustics, School of Electrical Engineering, Blvd. Kralja Aleksandra 73, 11000 Belgrade, Serbia
autor
  • Department of Acoustics, School of Electrical Engineering, Blvd. Kralja Aleksandra 73, 11000 Belgrade, Serbia
autor
  • Department of Acoustics, School of Electrical Engineering, Blvd. Kralja Aleksandra 73, 11000 Belgrade, Serbia
Bibliografia
  • 1. Catford J. C. (1977), Fundamental problems in phonetics, Edinburgh: Edinburgh University Press.
  • 2. De Veth J., Boves L. (1998), Channel normalization techniques for automatic speech recognition over the telephone, Speech Communication, 25, 149-164.
  • 3. Dimitriadis D., Maragos P., Potamianos A. (2005), Auditory Teager energy cepstrum coefficients for robust speech recognition, Proc. of European Conf. on Speech Communication and Technology – Interspeech 2005, Lisbon, Portugal, 3013-3016.
  • 4. Fan X., Hansen J. H. L. (2014), Speaker identification with whispered speech based on modified LFCC parameters and feature mapping, Proc. of International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, May 2014, pp. 4553-4556.
  • 5. Galić J., Jovičić S. T., Grozdić D., Marković B. (2014), HTK-based recognition of whispered speech, A. Ronzhin et al. [Eds.]: SPECOM 2014, LNAI 8773, Springer International Publishing Switzerland 2014, 251-259.
  • 6. Gang L., Heming Z. (2009), Formant frequency estimations of whispered speech in Chinese, Archives of Acoustics, 34, 2, 127-135.
  • 7. Gang L., Heming Z. (2012), Joint factor analysis of channel mismatch in whispering speaker verification, Archives of Acoustics, 37, 4, 555-559.
  • 8. Hansen J. H. L., Patil S. (2007), Speech under stress: analysis, modeling and recognition, [in:]Müller C. [Ed.], Speaker Classification I: Fundamentals, Features, and Methods, Springer, Berlin-Heidelberg, pp. 108-137.
  • 9. Heracleous P. (2009), Using teager energy cepstrum and HMM distances in automatic speech recognition and analysis of unvoiced speech, International Journal of Information and Communication Engineering, 5, 1, 31-37.
  • 10. Hidden Markov Model Toolkit (2016), http://htk.eng.cam.ac.uk/ (retrieved June 15, 2016).
  • 11. Ito T., Takeda K., Itakura F. (2005), Analysis and recognition of whispered speech, Speech Communication, 45, 139-152.
  • 12. Jovičić S. T. (1998), Formant feature differences between whispered and voiced sustained vowels, Acustica united with Acta Acoustica, 84, 4, 739-743.
  • 13. Jovičić S. T., Šarić Z. M. (2008), Acoustic analysis of consonants in whispered speech, Journal of Voice, 22, 3, 263-274.
  • 14. Kaiser J. F. (1983), Some observations on vocal tract operation from a fluid flow point of view, in: Vocal Fold Physiology: Biomechanics, Acoustics and Phonatory Control, Titze I. R., Scherer R. C. [Eds.], Denver Center for the Performing Arts, Denver, CO, pp. 358-386.
  • 15. Kostek B. (1999), Soft computing in acoustics, applications of neural networks, fuzzy logic and rough sets to musical acoustics, Springer-Verlag, Berlin.
  • 16. Kozierski P., Sadalla T., Drags S., Dobrowski A., Horla D. (2016), Kaldi toolkit in Polish whispery speech recognition, Przegląd Elektrotechniczny, R. 92, 11, 301-304.
  • 17. Marković B., Galić J., Grozdić D., Jovičić S. T. (2013), Application of DTW method for whispered speech recognition, Proc. of 4th International Conference on Fundamental and Applied Aspects of Speech and Language, Belgrade, 308-315.
  • 18. Marković B., Jovičić S. T., Galić J., Grozdić D. (2013), Whispered speech database: design, processing and application, Proc. of 16th International Conference, TSD 2013, I. Habernal and V. Matousek [Eds.]: TSD 2013, LNAI 8082, Springer-Verlag Berlin Heidelberg, pp. 591-598.
  • 19. Neyman J., Pearson E. (1933), On the problem of the most efficient tests of statistical hypotheses, Philosophical Transactions of the Royal Society of London. Series A, 231, 289-337.
  • 20. Rabiner L., Juang B-H. (1993), Fundamentals of speech recognition, Prentice Hall, New Jersey.
  • 21. Sakoe H., Chiba S. (1978), Dynamic programming optimization for spoken word recognition, IEEE Transactions on Acoustics, Speech, and Signal Processing, 26, 1, 43-49.
  • 22. Tsunoda K., Sekimoto S., Baer T. (2012), Brain activity in aphonia after a coughing episode: different brain activity in healthy whispering and pathological aphonic conditions, Journal of Voice, 26, 5, 668.e11-668.e13.
  • 23. Zhang C., Hansen J. H. L. (2007), Analysis and classification of speech mode: whisper through shouted, Proc. of Interspeech 2007, pp. 2289-2292.
  • 24. Zhou X., Garcia-Romero D., Duraiswami R., Espy-Wilson C., Shamma S. (2011), Linear versus mel frequency cepstral coefficients for speaker recognition, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2011, Waikoloa, HI, USA, December 11-15, pp. 559-564.
Uwagi
Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2018).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-744ed02b-1c67-4988-900e-aaf6d5bf5477
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.