Tytuł artykułu
Autorzy
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
Recognition of speech uttered by severe dysarthric speakers needs a robust learning technique. One of the commonly used generative model-based classifiers for speech recognition is a hidden Markov model. Generative model-based classifiers do not do well for overlapping classes and due to insufficient training data. Dysarthric speech is normally partial or incomplete that leads to improper learning of temporal dynamics. To overcome these issues, we focus on learning features for dysarthric speech recognition that involves recognizing the sequential patterns of varying length utterances. We propose a Generative Model-Driven Feature Learning based discriminative framework that maps the sequence of feature vectors to fixed dimension vector spaces induced by the generative models. The discriminative classifier is built in that vector space. The proposed HMM-based fixed dimensional vector representation provides better discrimination for dysarthric speech than the conventional HMM. We examine the performance of the proposed method to recognize the isolated utterances from the UA-Speech database. The recognition accuracy of the proposed model is better than the conventional hidden Markov model-based approach.
Wydawca
Czasopismo
Rocznik
Tom
Strony
553--561
Opis fizyczny
Bibliogr. 35 poz., rys., tab., wykr.
Twórcy
autor
- Department of Computer Science and Engineering, Sri Venkateswara College of Engineering, Pennalur, Sriperumbudur, Tamil Nadu 602117, India
autor
- Department of Computer Science and Engineering, Rajalakshmi Engineering College, Rajalakshmi Nagar, Thandalam, Chennai, India
Bibliografia
- [1] Kent RD, Vorperian HK, Duffy JKJ. Voice dysfunction in dysarthria: application of the multi-dimensional voice programTM. J Commun Disord 2003;36(4):281–306.
- [2] Bunton K, Kent RD, Kent JF, Duffy JR. The effects of flattening fundamental frequency contours on sentence intelligibility in speakers with dysarthria. Clin Linguist Phon 2001;15(3):181–93.
- [3] Ramig LO. The role of phonation in speech intelligibility: a review and preliminary data from patients with Parkinson's disease. Intell Speech Disord: Theory Meas Manage 1992;119–55.
- [4] Polur PD, Miller GE. Effect of high-frequency spectral components in computer recognition of dysarthric speech based on a mel-cepstral stochastic model. J Rehabil Res Dev 2005;42(3):363.
- [5] Selouani SA, Yakoub MS, O'Shaughnessy D. Alternative speech communication system for persons with severe speech disorders. EURASIP J Adv Signal Process 2009;2009:6.
- [6] Hasegawa-Johnson M, Gunderson J, Penman A, Huang T. HMM-based and SVM-based recognition of the speech of talkers with spastic dysarthria. IEEE International Conference on Acoustics, Speech and Signal Processing. ICASSP 2006 Proceedings, vol. 3. IEEE; 2006. p. III-1060–0.
- [7] Ksentini KPA, Viho C, Bonnin J. Perceptual evaluation of speech quality (PESQ): an objective method for end-to-end speech quality assessment of narrowband telephone networks and speech codecs. University of Rennes; 2009.
- [8] Menendez-Pidal X, Polikoff JB, Peters SM, Leonzio JE, Bunnel H. The Nemours database of dysarthric speech. Fourth International Conference on Spoken Language. ICSLP 96. Proceedings, vol. 3. IEEE; 1996. p. 1962–5.
- [9] Deller JR, Hsu D, Ferrier LJ. On the use of hidden Markov modelling for recognition of dysarthric speech. Comput Methods Progr Biomed 1991;35(2):125–39.
- [10] Raghavendra P, Rosengren E, Hunnicutt S. An investigation of different degrees of dysarthric speech as input to speaker-adaptive and speaker-dependent recognition systems. Augment Altern Commun 2001;17(4): 265–75.
- [11] Hawley MS, Enderby P, Green P, Cunningham S, Brownsell S, Carmichael J, et al. A speech-controlled environmental control system for people with severe dysarthria. Med Eng Phys 2007;29(5):586–93.
- [12] Sanders E, Ruiter MB, Beijer L, Strik H. Automatic recognition of Dutch dysarthric speech: a pilot study. INTERSPEECH; 2002.
- [13] Jayaram G, Abdelhamied K. Experiments in dysarthric speech recognition using artificial neural networks. J Rehabil Res Dev 1995;32. 162–162.
- [14] Shahamiri SR, Binti Salim S. A multi-views multi-learners approach towards dysarthric speech recognition using multi-nets artificial neural networks; 2014.
- [15] Rudzicz F. Articulatory knowledge in the recognition of dysarthric speech. IEEE Trans Audio Speech Lang Process 2011;19(4):947–60.
- [16] Polur PD, Miller GE. Investigation of an HMM/ANN hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals. Med Eng Phys 2006;28(8):741–8.
- [17] Kim H, Hasegawa-Johnson M, Perlman A, Gunderson J, Huang TS, Watkin K, et al. Dysarthric speech database for universal access research. INTERSPEECH. 2008. pp. 1741–4.
- [18] Sharma HV, Hasegawa-Johnson M. State-transition interpolation and map adaptation for hmm-based dysarthric speech recognition. Proceedings of the NAACL HLT 2010 Workshop on Speech and Language Processing for Assistive Technologies; 2010. p. 72–9.
- [19] Zue V, Seneff S, Glass J. Speech database development at MIT: Timit and beyond. Speech Commun 1990;9(4):351–6.
- [20] Walter O, Despotovic V, Haeb-Umbach R, Gemmeke J, Ons B, et al. An evaluation of unsupervised acoustic model training for a dysarthric speech interface. INTERSPEECH; 2014.
- [21] De Pauw G, Daelemans W, Huyghe J, Derboven J, Vuegen L, Van Den Broeck B, et al. Self-taught assistive vocal interfaces: an overview of the Aladdin project; 2013.
- [22] Kim J, Kumar N, Tsiartas A, Li M, Narayanan SS. Automatic intelligibility classification of sentence-level pathological speech. Comput Speech Lang 2015;29(1):132–44.
- [23] Clapham RP, van der Molen L, van Son R, van den Brekel M, Hilgers FJ. NKI-CCRT corpus: speech intelligibility before and after advanced head and neck cancer treated with concomitant chemoradiotherapy; 2012.
- [24] Rudzicz F, Namasivayam AK, Wolff T. The torgo database of acoustic and articulatory speech from speakers with dysarthria. Lang Resour Eval 2012;46(4):523–41.
- [25] Lee C, Rabiner L, Pieraccini R, Wilpon J. Acoustic modeling for large vocabulary speech recognition. Comput Speech Lang 1990;4(2):127–65.
- [26] Gauvain JL, Lee CH. Maximum a posteriori estimation for multivariate gaussian mixture observations of Markov chains. IEEE Trans Speech Audio Process 1994;2(2): 291–8.
- [27] Rabiner L, Juang BH. An introduction to hidden Markov models. ASSP Magazine IEEE 1986;3(1):4–16.
- [28] Burges CJ. A tutorial on support vector machines for pattern recognition. Data Min knowl Discov 1998;2(2): 121–67.
- [29] Cristianini N, Shawe-Taylor J. An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press; 2000.
- [30] Wiśniewski M, Kuniszyk-Jóźkowiak W, Smołka E, Suszyński W. Automatic detection of disorders in a continuous speech with the hidden Markov models approach. Computer Recognition Systems 2. Springer; 2007. p. 445–53.
- [31] Godino-Llorente JI, Gomez-Vilda P. Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Trans Biomed Eng 2004;51(2):380–4.
- [32] Jurafsky D, Martin JH. Speech & language processing. Pearson Education India; 2000.
- [33] Shahamiri SR, Salim SSB. Artificial neural networks as speech recognisers for dysarthric speech: identifying the best-performing set of MFCC parameters and studying a speaker-independent approach. Adv Eng Inf 2014;28 (1):102–10.
- [34] Murphy K. Hidden Markov model (HMM) toolbox for matlab; 1998, Available at: http://www.ai.mit.edu/murphyk/Software/HMM/hmm.html.
- [35] Chang CC, Lin CJ. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2011;2 (3):27.
Uwagi
PL
Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-26c8193d-b105-4315-809c-9828d787b3f4