Generative Model-Driven Feature Learning for dysarthric speech recognition

Rajeswari, N.; Chandrakala, S.

doi:10.1016/j.bbe.2016.05.003

Artykuł - szczegóły

Tytuł artykułu

Generative Model-Driven Feature Learning for dysarthric speech recognition

Autorzy

Rajeswari N. , Chandrakala S.

Wybrane pełne teksty z tego czasopisma

Identyfikatory

DOI

10.1016/j.bbe.2016.05.003

Warianty tytułu

Języki publikacji

Abstrakty

Recognition of speech uttered by severe dysarthric speakers needs a robust learning technique. One of the commonly used generative model-based classifiers for speech recognition is a hidden Markov model. Generative model-based classifiers do not do well for overlapping classes and due to insufficient training data. Dysarthric speech is normally partial or incomplete that leads to improper learning of temporal dynamics. To overcome these issues, we focus on learning features for dysarthric speech recognition that involves recognizing the sequential patterns of varying length utterances. We propose a Generative Model-Driven Feature Learning based discriminative framework that maps the sequence of feature vectors to fixed dimension vector spaces induced by the generative models. The discriminative classifier is built in that vector space. The proposed HMM-based fixed dimensional vector representation provides better discrimination for dysarthric speech than the conventional HMM. We examine the performance of the proposed method to recognize the isolated utterances from the UA-Speech database. The recognition accuracy of the proposed model is better than the conventional hidden Markov model-based approach.

Słowa kluczowe

generative model-driven feature learning dysarthric speech recognition support vector machine varying length sequences feature vector representation

rozpoznawanie mowy dyzartria maszyna wektorów nośnych

Wydawca

Nałęcz Institute of Biocybernetics and Biomedical Engineering of the Polish Academy of Sciences
Elsevier

Czasopismo

Biocybernetics and Biomedical Engineering

Rocznik

2016

Tom

Vol. 36, no. 4

Strony

553--561

Opis fizyczny

Bibliogr. 35 poz., rys., tab., wykr.

Twórcy

autor

Rajeswari N.

raji@svce.ac.in, rajeswari.natarajan@gmail.com

Department of Computer Science and Engineering, Sri Venkateswara College of Engineering, Pennalur, Sriperumbudur, Tamil Nadu 602117, India

autor

Chandrakala S.

Department of Computer Science and Engineering, Rajalakshmi Engineering College, Rajalakshmi Nagar, Thandalam, Chennai, India

Bibliografia

[1] Kent RD, Vorperian HK, Duffy JKJ. Voice dysfunction in dysarthria: application of the multi-dimensional voice programTM. J Commun Disord 2003;36(4):281–306.
[2] Bunton K, Kent RD, Kent JF, Duffy JR. The effects of flattening fundamental frequency contours on sentence intelligibility in speakers with dysarthria. Clin Linguist Phon 2001;15(3):181–93.
[3] Ramig LO. The role of phonation in speech intelligibility: a review and preliminary data from patients with Parkinson's disease. Intell Speech Disord: Theory Meas Manage 1992;119–55.
[4] Polur PD, Miller GE. Effect of high-frequency spectral components in computer recognition of dysarthric speech based on a mel-cepstral stochastic model. J Rehabil Res Dev 2005;42(3):363.
[5] Selouani SA, Yakoub MS, O'Shaughnessy D. Alternative speech communication system for persons with severe speech disorders. EURASIP J Adv Signal Process 2009;2009:6.
[6] Hasegawa-Johnson M, Gunderson J, Penman A, Huang T. HMM-based and SVM-based recognition of the speech of talkers with spastic dysarthria. IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP 2006 Proceedings, vol. 3. IEEE; 2006. p. III-1060–0.
[7] Ksentini KPA, Viho C, Bonnin J. Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. University of Rennes; 2009.
[8] Menendez-Pidal X, Polikoff JB, Peters SM, Leonzio JE, Bunnel H. The Nemours database of dysarthric speech. Fourth International Conference on Spoken Language. ICSLP 96. Proceedings, vol. 3. IEEE; 1996. p. 1962–5.
[9] Deller JR, Hsu D, Ferrier LJ. On the use of hidden Markov modelling for recognition of dysarthric speech. Comput Methods Progr Biomed 1991;35(2):125–39.
[10] Raghavendra P, Rosengren E, Hunnicutt S. An investigation of different degrees of dysarthric speech as input to speaker-adaptive and speaker-dependent recognition systems. Augment Altern Commun 2001;17(4): 265–75.
[11] Hawley MS, Enderby P, Green P, Cunningham S, Brownsell S, Carmichael J, et al. A speech-controlled environmental control system for people with severe dysarthria. Med Eng Phys 2007;29(5):586–93.
[12] Sanders E, Ruiter MB, Beijer L, Strik H. Automatic recognition of Dutch dysarthric speech: a pilot study. INTERSPEECH; 2002.
[13] Jayaram G, Abdelhamied K. Experiments in dysarthric speech recognition using artificial neural networks. J Rehabil Res Dev 1995;32. 162–162.
[14] Shahamiri SR, Binti Salim S. A multi-views multi-learners approach towards dysarthric speech recognition using multi-nets artificial neural networks; 2014.
[15] Rudzicz F. Articulatory knowledge in the recognition of dysarthric speech. IEEE Trans Audio Speech Lang Process 2011;19(4):947–60.
[16] Polur PD, Miller GE. Investigation of an HMM/ANN hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals. Med Eng Phys 2006;28(8):741–8.
[17] Kim H, Hasegawa-Johnson M, Perlman A, Gunderson J, Huang TS, Watkin K, et al. Dysarthric speech database for universal access research. INTERSPEECH. 2008. pp. 1741–4.
[18] Sharma HV, Hasegawa-Johnson M. State-transition interpolation and map adaptation for hmm-based dysarthric speech recognition. Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies; 2010. p. 72–9.
[19] Zue V, Seneff S, Glass J. Speech database development at MIT: Timit and beyond. Speech Commun 1990;9(4):351–6.
[20] Walter O, Despotovic V, Haeb-Umbach R, Gemmeke J, Ons B, et al. An evaluation of unsupervised acoustic model training for a dysarthric speech interface. INTERSPEECH; 2014.
[21] De Pauw G, Daelemans W, Huyghe J, Derboven J, Vuegen L, Van Den Broeck B, et al. Self-taught assistive vocal interfaces: an overview of the Aladdin project; 2013.
[22] Kim J, Kumar N, Tsiartas A, Li M, Narayanan SS. Automatic intelligibility classification of sentence-level pathological speech. Comput Speech Lang 2015;29(1):132–44.
[23] Clapham RP, van der Molen L, van Son R, van den Brekel M, Hilgers FJ. NKI-CCRT corpus: speech intelligibility before and after advanced head and neck cancer treated with concomitant chemoradiotherapy; 2012.
[24] Rudzicz F, Namasivayam AK, Wolff T. The torgo database of acoustic and articulatory speech from speakers with dysarthria. Lang Resour Eval 2012;46(4):523–41.
[25] Lee C, Rabiner L, Pieraccini R, Wilpon J. Acoustic modeling for large vocabulary speech recognition. Comput Speech Lang 1990;4(2):127–65.
[26] Gauvain JL, Lee CH. Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains. IEEE Trans Speech Audio Process 1994;2(2): 291–8.
[27] Rabiner L, Juang BH. An introduction to hidden Markov models. ASSP Magazine IEEE 1986;3(1):4–16.
[28] Burges CJ. A tutorial on support vector machines for pattern recognition. Data Min knowl Discov 1998;2(2): 121–67.
[29] Cristianini N, Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press; 2000.
[30] Wiśniewski M, Kuniszyk-Jóźkowiak W, Smołka E, Suszyński W. Automatic detection of disorders in a continuous speech with the hidden Markov models approach. Computer Recognition Systems 2. Springer; 2007. p. 445–53.
[31] Godino-Llorente JI, Gomez-Vilda P. Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Trans Biomed Eng 2004;51(2):380–4.
[32] Jurafsky D, Martin JH. Speech & language processing. Pearson Education India; 2000.
[33] Shahamiri SR, Salim SSB. Artificial neural networks as speech recognisers for dysarthric speech: identifying the best-performing set of MFCC parameters and studying a speaker-independent approach. Adv Eng Inf 2014;28 (1):102–10.
[34] Murphy K. Hidden Markov model (HMM) toolbox for matlab; 1998, Available at: http://www.ai.mit.edu/murphyk/Software/HMM/hmm.html.
[35] Chang CC, Lin CJ. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2011;2 (3):27.

Uwagi

Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-26c8193d-b105-4315-809c-9828d787b3f4