Automatic identification of dysphonias using machine learning algorithms

Bello-Rivera, Miguel Angel; Reyes-García, Carlos Alberto; Talavera-Rojas, Tania Cristal; Quintero-Flores, Perfecto Malaquías; Pérez-Loaiza, Rodolfo Eleazar

doi:10.35784/acs-2023-32

Artykuł - szczegóły

Tytuł artykułu

Automatic identification of dysphonias using machine learning algorithms

Autorzy

Bello-Rivera Miguel Angel , Reyes-García Carlos Alberto , Talavera-Rojas Tania Cristal , Quintero-Flores Perfecto Malaquías , Pérez-Loaiza Rodolfo Eleazar

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.35784/acs-2023-32

Warianty tytułu

Języki publikacji

Abstrakty

Dysphonia is a prevalent symptom of some respiratory diseases that affects voice quality, even for prolonged periods. For its diagnosis, speech-language pathologists make use of different acoustic parameters to perform objective evaluations on patients and determine the type of dysphonia that affects them, such as hyperfunctional and hypofunctional dysphonia, which is important because each type requires a different treatment. In the field of artificial intelligence this problem has been addressed through the use of acoustic parameters that are used as input data to train machine learning and deep learning models. However, its purpose is usually to identify whether a patient is ill or not, making binary classifications between healthy voices and voices with dysphonia, but not between dysphonias. In this paper, harmonic-to-noise ratio, cepstral peak prominence-smoothed, zero crossing rate and the means of the Mel frequency cepstral coefficients (2-19) are used to make multiclass classification of voices with euphony, hyperfunction and hypofunction by means of six machine learning algorithms, which are: Random Forest, K nearest neighbors, Logistic regression, Decision trees, Support vector machines and Naive Bayes. In order to evaluate which of them presents a better performance to identify the three voice classes, bootstrap.632 was used. It is concluded that the best confidence interval ranges from 87% to 92%, in terms of accuracy for the K Nearest Neighbors model. Results can be implemented in the development of a complementary application for the clinical diagnosis or monitoring of a patient under the supervision of a specialist.

Słowa kluczowe

dysphonia machine learning multiclass classification voice signal

Wydawca

Polskie Towarzystwo Promocji Wiedzy
Lublin University of Technology

Czasopismo

Applied Computer Science

Rocznik

2023

Tom

Vol. 19, no 4

Strony

14--25

Opis fizyczny

Bibliogr. 26 poz., fig., tab.

Twórcy

autor

Bello-Rivera Miguel Angel

podriaservirte@gmail.com

Tecnológico Nacional de México, Campus Apizaco, Departamento de Sistemas Computacionales, México

https://orcid.org/0009-0003-6641-3094

autor

Reyes-García Carlos Alberto

kargaxxi@inaoep.mx

Instituto Nacional de Astrofísica, Óptica y Electrónica, Departamento de Ciencias y Tecnologías Biomédicas, México

https://orcid.org/0000-0003-4773-9585

autor

Talavera-Rojas Tania Cristal

ttalavera@uaa.edu.py

Universidad Autónoma de Asunción, Facultad de Ciencias de la Salud, Departamento de Neuropsicología, Paraguay

https://orcid.org/0000-0001-7656-3115

autor

Quintero-Flores Perfecto Malaquías

perfecto.qf@apizaco.tecnm.mx

Tecnológico Nacional de México, Campus Apizaco, Departamento de Sistemas Computacionales, México

https://orcid.org/0000-0001-7651-4364

autor

Pérez-Loaiza Rodolfo Eleazar

rodolfo.pl@apizaco.tecnm.mx

Tecnológico Nacional de México, Campus Apizaco, Departamento de Sistemas Computacionales

https://orcid.org/0000-0002-6500-258X

Bibliografia

[1] Altayeb, M., & Al-Ghraibah, A. (2022). Classification of three pathological voices based on specific features groups using support vector machine. International Journal of Electrical and Computer Engineering (IJECE), 12(1), 946-956. https://doi.org/10.11591/ijece.v12i1.pp946-956
[2] Behlau, M., & Pontes, P. (1989). Avaliação Global da Voz. Editora Paulista Publicações Médicas.
[3] Behlau, M., Madazio, G., Feijó, D., Azevedo, R., Gielow, I., & Rehder, M. (2005). Perfeccionamiento vocal y tratamiento fonoaudiológico de las disfonías. In M. Behlau (Eds.), Voz: O livro do especialista. Thieme Revinter.
[4] Celdrán, E. M. (2015). Naturaleza fonética de la consonante ‘ye’en español. Normas: revista de estudios lingüísticos hispánicos, 5, 117-131. https://doi.org/10.7203/Normas.5.6825
[5] Cesari, U., De Pietro, G., Marciano, E., Niri, C., Sannino, G., & Verde, L. (2018). A new database of healthy and pathological voices. Computers & Electrical Engineering, 68, 310-321. https://doi.org/10.1016/j.compeleceng.2018.04.008
[6] Charbuty, B., & Abdulazeez, A. (2021). Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends, 2(01), 20-28. https://doi.org/10.38094/jastt20165
[7] Chen, L., & Chen, J. (2022). Deep neural network for automatic classification of pathological voice signals. Journal of Voice, 36(2), 288.e15-288.e24. https://doi.org/10.1016/j.jvoice.2020.05.029
[8] Daniels, L., & Minot, N. (2019). An introduction to statistics and data analysis using Stata®: From research design to final report. Sage Publications.
[9] Descamps, G., Verset, L., Trelcat, A., Hopkins, C., Lechien, J. R., Journe, F., & Saussez, S. (2020). ACE2 protein landscape in the head and neck region: the conundrum of SARS-CoV-2 infection. Biology, 9(8), 235. https://doi.org/10.3390%2Fbiology9080235
[10] Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on cross-validation. Journal of the American Statistical Association, 78(382), 316-331. https://doi.org/10.2307/2288636
[11] Farias, P. (2016). Guía clínica para el especialista en laringe y voz. Librería Akadia Editorial.
[12] Flórez-Gómez, A. F., Orozco-Arroyave, J. R., & Roldán-Vasco, S. (2022). Correlación entre espacios de características acústicas del habla y trastornos clínicos de la voz en pacientes con disfagia. TecnoLógicas, 25(53), e2220. https://doi.org/10.22430/22565337.2220
[13] Hassan, A., Shahin, I., & Alsabek, M. B. (2020). COVID-19 detection system using recurrent neural networks. 2020 International conference on communications, computing, cybersecurity, and informatics (CCCI) (pp. 1-5). IEEE. https://doi.org/10.1109/CCCI49893.2020.9256562
[14] Hoffmann, M., Kleine-Weber, H., Schroeder, S., Krüger, N., Herrler, T., Erichsen, S., Schiergens, T. S., Herrler, G., Wu, N.-H., Nitsche, A., Müller, M. A., Drosten, C., & Pöhlmann, S. (2020). SARS-CoV-2 cell entry depends on ACE2 and TMPRSS2 and is blocked by a clinically proven protease inhibitor. Cell, 181(2), 271-280.e8. https://doi.org/10.1016/j.cell.2020.02.052
[15] James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning. Springer.
[16] López, J. A. P. (1997). Los trastornos de la voz en el personal docente de logroño. Estudio de la voz en los profesionales de la enseñanza. (Doctoral dissertation, Universidad de Navarra).
[17] López, J. A. P. (2000). Estudio de la prevalencia de los trastornos de la voz en el personal docente de Logroño. Zubía, 12, 111-145.
[18] Murphy, K. P. (2006). Naive bayes classifiers. University of British Columbia, 18(60), 1-8.
[19] Núñez-Batalla, F., Cartón-Corona, N., Vasile, G., García-Cabo, P., Fernández-Vanes, L., & Llorente-Pendás, J. L. (2019). Validez de las medidas del pico cepstral para la valoración objetiva de la disfonía en sujetos de habla hispana. Acta Otorrinolaringológica Española, 70(4), 222-228. https://doi.org/10.1016/j.otoeng.2018.04.005
[20] Radha, N., Sachin Madhavan, R. M., & Sameera holy, S. (2021). Parkinson’s Disease detection using Machine Learning Techniques. International Journal of Early Childhood Special Education (INT-JECSE), 30(2), 543. https://doi.org/10.24205/03276716.2020.4055
[21] Rivera, M. A. B., Flores, P. M. Q., Loaiza, R. E. P., & Rivera, L. G. (2022). Analysis of audio signals using deep learning algorithms applied to COVID diagnostic systems. 2022 IEEE Mexican International Conference on Computer Science (ENC) (pp. 1-6). IEEE. https://doi.org/10.1109/ENC56672.2022.9882932
[22] Schonlau, M., & Zou, R. Y. (2020). The random forest algorithm for statistical learning. The Stata Journal, 20(1), 3-29. https://doi.org/10.1177/1536867X20909688
[23] Taunk, K., De, S., Verma, S., & Swetapadma, A. (2019). A brief review of nearest neighbor algorithm for learning and classification. 2019 International Conference on Intelligent Computing and Control Systems (ICCS) (pp. 1255-1260). IEEE. https://doi.org/10.1109/ICCS45141.2019.9065747
[24] Verdaguer, J. M., Górriz, C., Prim, M. P., del Palacio, A. J., Gavilán, J., & de Diego, J. I. (2008). Análisis de los cambios en el espectrograma tras la intubación endotraqueal. Acta Otorrinolaringológica Española, 59(5), 217-222. https://doi.org/10.1016/S0001-6519(08)73298-9
[25] Verde, L., De Pietro, G., Alrashoud, M., Ghoneim, A., Al-Mutib, K. N., & Sannino, G. (2019). Leveraging artificial intelligence to improve voice disorder identification through the use of a reliable mobile app. IEEE Access, 7, 124048-124054. https://doi.org/10.1109/ACCESS.2019.2938265
[26] Woldert-Jokisz, B. (2007). Saarbruecken voice database. Computer Science.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-cf92c51b-27da-4f37-97e5-eae2636dc3c9