Fully automated speaker identification and intelligibility assessment in dysarthria disease using auditory knowledge

Kadi, K. L.; Selouani, S. A.; Boudraa, B.; Boudraa, M.

doi:10.1016/j.bbe.2015.11.004

Artykuł - szczegóły

Tytuł artykułu

Fully automated speaker identification and intelligibility assessment in dysarthria disease using auditory knowledge

Autorzy

Kadi K. L. , Selouani S. A. , Boudraa B. , Boudraa M.

Wybrane pełne teksty z tego czasopisma

Identyfikatory

DOI

10.1016/j.bbe.2015.11.004

Warianty tytułu

Języki publikacji

Abstrakty

Millions of children and adults suffer from acquired or congenital neuro-motor communication disorders that can affect their speech intelligibility. The automatically characterization of speech impairment can contribute to improve the patient's life quality, and assist experts in assessment and treatment design. In this paper, we present new approaches to improve the analysis and classification of disordered speech. First, we propose an automatic speaker recognition approach especially adapted to identify dysarthric speakers. Secondly, we suggest a method for the automatic assessment of the dysarthria severity level. For this purpose, a model simulating the external, middle and inner parts of the ear is presented. This ear model provides relevant auditory-based cues that are combined with the usual Mel-Frequency Cepstral Coefficients (MFCC) to represent atypical speech utterances. The experiments are carried out by using data of both Nemours and Torgo databases of dysarthric speech. Gaussian Mixture Models (GMMs), Support Vector Machines (SVMs) and hybrid GMM/SVM systems are tested and compared in the context of dysarthric speaker identification and assessment. The experimental results achieve a correct speaker identification rate of 97.2% which can be considered promising for this novel approach; also the existing assessment systems are outperformed with a 93.2% correct classification rate of dysarthria severity levels.

Słowa kluczowe

dysarthria speech processing auditory cues GMM SVM hybrid GMM/SVM

dyzartria przetwarzanie mowy mieszanina rozkładów Gaussa maszyna wektorów nośnych

Wydawca

Nałęcz Institute of Biocybernetics and Biomedical Engineering of the Polish Academy of Sciences
Elsevier

Czasopismo

Biocybernetics and Biomedical Engineering

Rocznik

2016

Tom

Vol. 36, no. 1

Strony

233--247

Opis fizyczny

Bibliogr. 45 poz., rys., tab., wykr.

Twórcy

autor

Kadi K. L.

kkadi@usthb.dz

Faculty of Electronics and Computer Science, University of Sciences and Technology Houari Boumediene, 32 El Alia, 16111 Bab Ezzouar, Algiers, Algeria

autor

Selouani S. A.

selouani@umcs.ca

Department of Information Management, University of Moncton, Campus of Shippagan, Shippagan, NB E8S 1P6, Canada

autor

Boudraa B.

bboudraa@usthb.dz

Faculty of Electronics and Computer Science, University of Sciences and Technology Houari Boumediene, 32 El Alia, 16111 Bab Ezzouar, Algiers, Algeria

autor

Boudraa M.

mboudraa@usthb.dz

Faculty of Electronics and Computer Science, University of Sciences and Technology Houari Boumediene, 32 El Alia, 16111 Bab Ezzouar, Algiers, Algeria

Bibliografia

[1] Melf RS. Communication disorders. Available from http://emedicine.medscape.com/article.
[2] Roth C. Dysarthria. In: Caplan B, Deluca J, Kreutzer JS, editors. Encyclopedia of clinical neuropsychology. Springer; 2011. p. 905–8.
[3] American Speech-Language-Hearing Association. Available from: http://www.asha.org/.
[4] Enderby P. Disorders of communication: dysarthria. In: Tselis AC, Booss J, editors. Handbook of clinical neurology. Elsevier; 2013.
[5] Rudzicz F. Using articulatory likelihoods in the recognition of dysarthric speech. J Speech Commun 2012;54:430–44.
[6] Rudzicz F. Adjusting dysarthric speech signals to be more intelligible. J Speech Commun 2013;27:1163–77.
[7] Menendez-Pidal X, Polikoff JB, Peters SM, Leonzio JE, Bunnell HT. The Nemours database of dysarthric speech. Fourth international conference on spoken language; 1996. pp. 1962–5.
[8] Rudzicz F, Namasivayam AK, Wolff T. The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Lang Resour Eval 2012;46(4):523–41.
[9] O'Shaughnessy D. Speech communication: human and machine. IEEE Press; 2001.
[10] Shahamiri SR, Salim SSB. Artificial neural networks as speech recognizers for dysarthric speech: identifying the best-performing set of MFCC parameters and studying a speaker-independent approach. J Adv Eng Inform 2014;28 (1):102–10.
[11] Davis S, Mermelstein P. Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Trans Acoust Speech Signal Process 1980;28(4):357–66.
[12] Flanagan JL. Models for approximating basilar membrane displacement. Bell System Technol J 1960;39:1163–91.
[13] Lyon RF. A computational model of filtering, detection, and compression in the cochlea. IEEE Int Conf Acoust Speech Signal Process. 1982. pp. 1282–5.
[14] Seneff S. A joint synchrony/mean-rate model of auditory speech processing. J Phonet 1988;16:55–76.
[15] Ghitza O. Auditory models and human performance in tasks related to speech coding and speech recognition. IEEE Trans Speech Audio Proc SAP 1994;2:115–32.
[16] Stern R, Morgan N. Hearing is believing: biologically inspired methods for robust automatic speech recognition. IEEE Signal Process Mag 2012;29(6):34–43.
[17] Schluter R, Bezrukov L, Wagner H, Ney H. Gammatone features and feature combination for large vocabulary speech recognition. Acoustics, speech and signal processing, ICASSP 2007, IEEE international conference, vol. 4. 2007. pp. 649–52.
[18] Kim C, Stern RM. Power-normalized cepstral coefficients (PNCC) for robust speech recognition. Acoustics, speech and signal processing (ICASSP), 2012 IEEE international conference. 2012. pp. 4101–4.
[19] Caelen J. Space/time data-information in the ARIAL project ear model. Speech Commun 1985;4:457–67.
[20] Selouani SA. Speech processing and soft computing. New York: Springer; 2011.
[21] Selouani SA, O'Shaughnessy D, Caelen J. Incorporating phonetic knowledge into an evolutionary subspace approach for robust speech recognition. Int J Comput Appl 2007;29:143–54.
[22] Selouani SA, Caelen J. Recognition of Arabic phonetic features using neural networks and knowledge-based system: a comparative study. Int J Artif Intell Tools 1999;8 (1):73–103.
[23] Selouani SA, Tolba H, O'Shaughnessy D. Auditory-based acoustic distinctive features and spectral cues for robust automatic speech recognition in Low-SNR car environments. Conference of the North American Chapter of the association for computational linguistics on human language technology: HLT-NAACL; 2003. pp. 91–3.
[24] Mary L. Extraction and representation of prosody for speaker, speech and language recognition. Springer briefs in speech technology; 2012 [Chap 1].
[25] Kadi KL, Selouani SA, Boudraa B, Boudraa M. Automated diagnosis and assessment of dysarthric speech using relevant prosodic features. In: Yang G, Ao S, Gelman L, editors. Transactions on engineering technologies. Springer; 2013.
[26] Rudzicz F. Phonological features in discriminative classification of dysarthric speech. ICASSP; 2009.
[27] Selouani SA, Dahmani H, Amami R, Hamam H. Using speech rhythm knowledge to improve dysarthric speech recognition. Int J Speech Technol 2012;15(1):57–64.
[28] Paja MS, Falk TH. Automated dysarthria severity classification for improved objective intelligibility assessment of spastic dysarthric speech. Interspeech; 2012.
[29] Kim J, Kumar N, Tsiartas A, Li M, Narayanan S. Automatic intelligibility classification of sentence-level pathological speech. Comput Speech Lang 2014;29(1):132–44.
[30] Enderby PM. Frenchay dysarthria assessment. PRO-ED; 1983.
[31] Enderby PM, Palmer R. Frenchay dysarthria assessment. Second Ed (FDA-2). PRO-ED; 2008.
[32] Baghai-Ravary L, Beet SW. Automatic speech signal analysis for clinical diagnosis and assessment of speech disorders. Springer briefs in electrical and computer engineering; 2013.
[33] LDC. Catalog number LDC2012S02. Linguistic Data Consortium; 2012, Available from: http://catalog.ldc.upenn.edu/docs/LDC2012S02/README.txt.
[34] Deng L, O'Shaughnessy D. Speech processing: a dynamic and optimization-oriented approach. Marcel Dekker, Inc; 2003.
[35] Polikoff JB, Bunnel HT. The Nemours database of dysarthric speech: a perceptual analysis. 14th international congress of phonetic sciences; 1999. pp. 783–6.
[36] Kent RD, Rosen K. Motor control perspectives on motor speech disorders. In: Maassen B, Kent RD, Peters H, Van LP, Hulstijn W, editors. Speech motor control in normal and disordered speech. Oxford University Press; 2004. p. 285– 311. chap 12.
[37] Rudzicz F. Articulatory knowledge in the recognition of dysarthric speech. IEEE Trans Audio Speech Lang Process 2011;19(4).
[38] Giannakopoulos T, Pikrakis A. Introduction to audio analysis. Academic Press, Elsevier; 2014.
[39] Campbell WM, Karam ZN. A framework for discriminative SVM/GMM systems for language recognition. Interspeech. 2009. pp. 2195–8.
[40] Abd El-Samie FE. Information security for automatic speaker identification. Springer briefs in speech technology. Springer; 2011.
[41] Reynolds DA, Rose RC. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Trans Speech Audio Process 1995;3(1):72–83.
[42] Dempster AP, Laird NM, Rubin DB. Maximum-likelihood from incomplete data via the EM algorithm. J Acoust Soc Am 1977;39(1):1–38.
[43] Vapnik VN. An overview of statistical learning theory. IEEE Trans Neural Netw 1999;10(5):988–99.
[44] liu M, Xie Y, Yao Z, Dai B. A new hybrid GMM/SVM for speaker verification. the 18th international conference on pattern recognition. IEEE; 2006.
[45] Fleury A, Vacher M, Noury N. SVM-based multimodal classification of activities of daily living in health smart homes: sensors, algorithms, and first experimental results. IEEE Trans Inform Technol Biomed 2010;14(2): 274–83.

Uwagi

Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-decb3844-2042-4aa8-b1ae-f510e2fc58a5