GFCC-based x-vectors for Reinke’s edema detection

Kotarba, Katarzyna

doi:10.21008/j.0860-6897.2022.3.07

Powiadomienia systemowe

Sesja wygasła!
Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

GFCC-based x-vectors for Reinke’s edema detection

Autorzy

Kotarba Katarzyna

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.21008/j.0860-6897.2022.3.07

Warianty tytułu

Języki publikacji

Abstrakty

Automatic assessment of voice disorders is one of the most important applications of speech signal analysis. Various algorithms utilizing both sustained vowels and continuous speech have been successfully used to perform detection of many voice pathologies, e.g. dysphonia, laryngitis, and vocal folds paralysis. However, algorithms described in literature used for classification of Reinke’s edema - one of the most severe smoking-induced voice conditions - are scarce and rely mostly on speech signals containing sustained vowels. In this paper, a method incorporating gammatone frequency cepstral coefficients (GFCC) based x-vectors extracted from continuous speech is presented. The extracted x-vectors are used to train a SGD classifier performing Reinke’s edema detection. For validation folds, the proposed method yielded AUC ROC, accuracy, recall, and specificity of 0.96 (±0.03), 0.94 (±0.02), 0.92 (±0.03), and 0.94 (±0.02), respectively. For testing set, the method yielded AUC ROC, accuracy, recall, and specificity of 0.98, 0.89, 0.88, and 0.89, respectively.

Słowa kluczowe

x-vectors Reinke’s edema voice pathology classification

x wektory obrzęk Reinkego klasyfikacja patologii głosu

Wydawca

Poznan University of Technology. Institute of Applied Mechanics

Czasopismo

Vibrations in Physical Systems

Rocznik

2022

Tom

Vol. 33, nr 3

Strony

art. no. 2022307

Opis fizyczny

Bibliogr. 24 poz., il. kolor., 1 rys.

Twórcy

autor

Kotarba Katarzyna

urbaniec@agh.edu.pl

AGH University of Science and Technology, Department of Mechanics and Vibroacoustics, al. Mickiewicza 30, 30-059 Krakow, Poland

https://orcid.org/0000-0002-5323-8958

Bibliografia

1. D. Hemmerling, A. Skalski, J. Gajda; Voice data mining for laryngeal pathology assessment; Computers in Biology and Medicine 2015, 69, 270-276. DOI: doi.org/10.1016/j.compbiomed.2015.07.026
2. M. Vasilakis, Y. Stylianou; Voice pathology detection based on short-term jitter estimations in running speech; Folia phoniatrica et logopaedica: official organ of the International Association of Logopedics and Phoniatrics (IALP) 2009, 61(3), 153-70. DOI: 10.1159/000219951
3. H. Cordeiro, C. Meneses, J. Fonseca; Continuous speech classification systems for voice pathologies identification; IFIP Advances in Information and Communication Technology 2015, 450, 217-224.
4. W. Wszołek, A. Izworski, G. Izworski; Signal processing and analysis of pathological speech using artificial intelligence and learning systems methods; Acta Physica Polonica A 2013, 123(6), 995-1000.
5. Z. W. Engel, M. Kłaczyński, W. Wszołek; A vibroacoustic model of selected human larynx diseases; International Journal of Occupational Safety and Ergonomics 2007, 13(4), 367-379. DOI: 10.1080/10803548.2007.11105094
6. M. Madruga, Y. Campos-Roca, C. K. Pérez; Robustness Assessment of Automatic Reinke’s Edema Diagnosis Systems; Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, 2020.
7. Massachusetts Eye and Ear Infirmary; Voice disorders database, ver. 1.03. Kay Elemetrics Corp., Lincoln Park, New Jersey, USA, 1994.
8. G. Stegmann et al.; Repeatability of Commonly Used Speech and Language Features for Clinical Applications; Digital Biomarkers 2020, 4(3), 109-122. DOI: 10.1159/000511671
9. K. Kotarba, M. Kotarba; Voice pathology assessment using x-vectors approach; Vibrations in Physical Systems 2021, 32(1), 2021108. DOI: 10.21008/j.0860-6897.2021.1.08
10. L. Jeancolas et al.; X-vectors: New quantitative biomarkers for early Parkinson’s disease detection from speech; Front.Neuroinform. 2021, 15, Article 578369. DOI: 10.3389/fninf.2021.578369
11. C. Botelho et al.; Pathological speech detection using x-vector embeddings; arXiv: 2003.00864 [eess.AS] 2020. DOI: 10.48550/arXiv.2003.00864
12. W. Barry, M. Pützer; Saarbrücken voice database; Institute of Phonetics, Univ. of Saarland. Online. Available: http://www.stimmdatenbank.coli.uni-saarland.de
13. M. Kumar et al.; Designing neural speaker embeddings with meta learning; arXiv: 2007.16196 [eess.AS] 2020. DOI: 10.48550/arXiv.2007.16196
14. J. Deng et al.; ArcFace: Additive Angular Margin Loss for Deep Face Recognition; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, California, USA, 2019.
15. J. S. Chung, A. Nagrani, A. Zisserman; Voxceleb2: Deep speaker recognition; Proceedings of the Interspeech, Hyderabad, India, 2018.
16. F. Pedregosa et al.; Scikit-learn: Machine learning in Python; Journal of Machine Learning Research 2011, 12, 2825-2830.
17. N. Halko, P.-G. Martinsson, J. A. Tropp; Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions; SIAM Review 2011, 53(2), 217-288. DOI: 10.1137/090771806
18. T. Akiba et al.; Optuna: A next-generation hyperparameter optimization framework; Proceedings of the 25th International Conference on Knowledge Discovery and Data Mining, Anchorage, Alaska, USA, 2019.
19. J. Olczak et al.; Pesenting artificial intelligence, deep learning, and machine learning studies to clinicians and healthcare stakeholders: an introductory reference with a guideline and a Clinical AI Research (CAIR) checklist proposal; Acta Orthopaedica 2021, 92(5), 513-525. DOI: 10.1080/17453674.2021.1918389
20. J. Sidey-Gibbons, Ch. Sidey-Gibbons; Machine learning in medicine: a practical introduction; BMC Med. Res. Methodol., 2019, 19, Article 64. DOI: 10.1186/s12874-019-0681-4
21. T. Fawcett; An introduction to ROC analysis; Pattern Recognition Letters, 2006, 27(8), 861-874. DOI: 10.1016/j.patrec.2005.10.010
22. C. Calı̀, M. Longobardi; Some mathematical properties of the ROC curve and their applications; Ricerche di Matematica 2015, 64(2), 391-402.
23. R. Rao, G. Fung; On the dangers of cross-validation. An experimental evaluation; Proceedings of the SIAM International Conference on Data Mining, Atlanta, Georgia, USA, 2008.
24. S. Tabe-Bordbar et al.; A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models; Scientific Reports 2018, 8, Article 6620. DOI: 10.1038/s41598-018-24937-4

Uwagi

Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-f8bb71a7-c951-4d82-a55c-ee27873f83c2