PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Automatic voice pathology detection and classification using vocal tract area irregularity

Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
In this paper, an automatic voice pathology detection (VPD) system based on voice production theory is developed. More specifically, features are extracted from vocal tract area, which is connected to the glottis. Voice pathology is related to a vocal fold problem, and hence the vocal tract area which is connected to vocal folds or glottis should exhibit irregular patterns over frames in case of a sustained vowel for a pathological voice. This irregular pattern is quantified in the form of different moments across the frames to distinguish between normal and pathological voices. The proposed VPD system is evaluated on the Massachusetts Eye and Ear Infirmary (MEEI) database and Saarbrucken Voice Database (SVD) with sustained vowel samples. Vocal tract irregularity features and support vector machine classifier are used in the proposed system. The proposed system achieves 99.22% ± 0.01 accuracy on the MEEI database and 94.7% ± 0.21 accuracy on the SVD. The results indicate that vocal tract irregularity measures can be used effectively in automatic voice pathology detection.
Twórcy
autor
  • Speech Processing Group, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, P.O. Box 51178, Riyadh 11543, Saudi Arabia
  • Speech Processing Group, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
  • Speech Processing Group, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
autor
  • Speech Processing Group, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia; Centre for Intelligent Signal and Imaging Research, Department of Electrical and Electronic Engineering, Universiti Tekhnologi PETRONAS, Tronoh, Perak, Malaysia
  • ENT Department, College of Medicine, King Saud University, Riyadh, Saudi Arabia; Research Chair of Voice, Swallowing, and Communication Disorders, King Saud University, Riyadh, Saudi Arabia; ENT Department, College of Medicine, Al-Menoufiya University, Shebin Alkoum, Egypt
autor
  • ENT Department, College of Medicine, King Saud University, Riyadh, Saudi Arabia; Research Chair of Voice, Swallowing, and Communication Disorders, King Saud University, Riyadh, Saudi Arabia
autor
  • ENT Department, College of Medicine, King Saud University, Riyadh, Saudi Arabia; Research Chair of Voice, Swallowing, and Communication Disorders, King Saud University, Riyadh, Saudi Arabia
  • Speech Processing Group, Department of Computer Engineering, College of Computer and Information Sciences, King Saud University, Riyadh, Saudi Arabia
Bibliografia
  • [1] Kreiman J, Gerratt BR, Precoda K. Listener experience and perception of voice quality. J Speech Hear Res 1990;33:103–15.
  • [2] Lieberman P. Perturbation in vocal pitch. J Acoust Am 1961;33:597–603.
  • [3] Titze IR, Liang H. Comparison of F0 extraction methods for high-precision voice perturbation measurements. J Speech Hear Res 1993;36:1120–33.
  • [4] Muhammad G, Alsulaiman M, Mahmood A, Ali Z. Automatic voice disorder classification using vowel formants. IEEE International Conference on Multimedia and Expo (ICME) – Workshop MUST-EH 2011; 2011.
  • [5] Moran RJ, Reilly RB, Chazal P, Lacy PD. Telephony-based voice pathology assessment using automated speech analysis. IEEE Trans Biomed Eng 2006;53(3):468–77.
  • [6] Heman-Ackah YD, Heuer RJ, Michael DD, Ostrowski R, Horman M, Baroody MM, et al. Cepstral peak prominence: a more reliable measure of dysphonia. Ann Otol Rhinol Laryngol 2003;112(4):324–33.
  • [7] Vasilakis M, Stylianou Y. Spectral jitter modeling and estimation. Biomed Signal Process Control 2009;183–93.
  • [8] Martin D, Fitch J, Wolfe V. Pathologic voice type and the acoustic prediction of severity. J Speech Hear Res 1995;38:765–71.
  • [9] Shrivastav R. The use of an auditory model in predicting perceptual ratings of breathy voice quality. J Voice 2003;17 (4):502–12.
  • [10] Little MA, Costello DAE, Harries ML. Objective dysphonia quantification in vocal fold paralysis: comparing nonlinear with classical measures. J Voice 2009;25(1):21–31.
  • [11] Markaki M, Stylianou Y. Voice pathology detection and discrimination based on modulation spectral features. IEEE Trans Speech Audio Process 2011;19(7):1938–48.
  • [12] Arias-Londono JD, Godino-Llorente JI, Saenz-Lechon N, Osma-Ruiz V, Castellanos-Dominguez G. Automatic detection of pathological voices using complexity measures, noise parameters and mel-cepstral coefficients. IEEE Trans Biomed Eng 2011;58(2):370–9.
  • [13] Godino-Llorente J, Gomez-Vilda P, Blanco-Velasco M. Dimensionally reduction of a pathological voice quality assessment system based on Gaussian mixture models and short-term cepstral parameters. IEEE Trans Biomed Eng 2006;53(10):1943–53.
  • [14] Kay Elemetrics Corp. Disordered Voice Database (CD-ROM), Version 1.03. Boston, MA: Massachusetts Eye and Ear Infirmary (MEEI), Voice and Speech Lab; 1994.
  • [15] Muhammad G, Melhem M. Pathological voice detection and binary classification using MPEG-7 audio features. Biomed Signal Process Controls 2014;11:1–9.
  • [16] Hossain MS, Muhammad G. Cloud-assisted speech and face recognition framework for health monitoring. Mob Netw Appl 2015;20(3):391–9.
  • [17] Ali Z, Elamvazuthi I, Alsulaiman M, Muhammad G. Automatic voice pathology detection with running speech by using estimation of auditory spectrum and cepstral coefficients based on the all-pole model. J Voice 2015. http://dx.doi.org/10.1016/j.jvoice.2015.08.010.
  • [18] Lowell SY, Colton RH, Kelley RT, Hahn YC. Spectral- and cepstral-based measures during continuous speech: capacity to distinguish dysphonia and consistency within a speaker. J Voice 2011;25(5):e223–32.
  • [19] Muhammad G, Mesallam TA, Almalki KH, Farahat M, Mahmood A, Alsulaiman M. Multi directional regression (MDR) based features for automatic voice disorder detection. J Voice 2012;26(6). pp. 817.e19–817.e27.
  • [20] Kent RD, Kim Y. Acoustic analysis of speech. In: Ball MJ, Perkins MR, Müller N, Howard S, editors. The Handbook of Clinical Linguistics. Oxford, UK: Blackwell Publishing Ltd.; 2008. p. 364–5. http://dx.doi.org/10.1002/9781444301007.ch22.
  • [21] Lee JW, Kang HG, Choi JY, Son YI. An investigation of vocal tract characteristics for acoustic discrimination of pathological voices. BioMed Res Int 2013;2013. http://dx.doi.org/10.1155/2013/758731. Article ID 758731.
  • [22] Parsa V, Jamieson D. Identification of pathological voices using glottal noise measures. J Speech Lang Hear Res 2000;43(2):469–85.
  • [23] Campbell JP, Reynolds DA. Corpora for the evaluation of speaker recognition systems. Proc. International Conference on Acoustics, Speech, and Signal Processing (ICASSP) '99, vol. 2; 1999. pp. 829–32.
  • [24] Saarbrucken Voice Database (SVD), version 2.0. Available at http://www.stimmdatenbank.coli.uni-saarland.de/help_en. php4 [accessed December 2015].
  • [25] Muhammad G. Voice pathology detection using vocal tract area. Proc. of European Modeling Symposium (EMS2013); 2013.
  • [26] Markel JE, Gray AH. Linear prediction of speech. Secaucus, NJ: Springer-Verlag New York, Inc.; 1982.
  • [27] Chih-Chung C, Chih-Jen L. LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2011;2 (27):1–27. Software available at http://www.csie.ntu.edu.tw/_cjlin/libsvm.
  • [28] Martinez D, Lleida E, Ortega A, Miguel A. Score Level versus audio level fusion for voice pathology detection on the Saarbrucken Voice Database. Advances in speech and language technologies for Iberian languages communications in computer and information science volume, vol. 328. 2012;p. 110–20.
  • [29] Nunes RB, de Souza A, Duprat A, Silva M, Costa R, Paulino J. Vocal tract analysis in patients with vocal fold nodules, clefts and cysts. Braz J Otorhinolaryngol 2009;75(2):188–92.
Uwagi
PL
Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-b067f4da-9962-419d-a451-a2121fb2e786
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.