Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
Automatic assessment of voice disorders is one of the most important applications of speech signal analysis. Various algorithms utilizing both sustained vowels and continuous speech have been successfully used to perform detection of many voice pathologies, e.g. dysphonia, laryngitis, and vocal folds paralysis. However, algorithms described in literature used for classification of Reinke’s edema - one of the most severe smoking-induced voice conditions - are scarce and rely mostly on speech signals containing sustained vowels. In this paper, a method incorporating gammatone frequency cepstral coefficients (GFCC) based x-vectors extracted from continuous speech is presented. The extracted x-vectors are used to train a SGD classifier performing Reinke’s edema detection. For validation folds, the proposed method yielded AUC ROC, accuracy, recall, and specificity of 0.96 (±0.03), 0.94 (±0.02), 0.92 (±0.03), and 0.94 (±0.02), respectively. For testing set, the method yielded AUC ROC, accuracy, recall, and specificity of 0.98, 0.89, 0.88, and 0.89, respectively.
Czasopismo
Rocznik
Tom
Strony
art. no. 2022307
Opis fizyczny
Bibliogr. 24 poz., il. kolor., 1 rys.
Twórcy
autor
- AGH University of Science and Technology, Department of Mechanics and Vibroacoustics, al. Mickiewicza 30, 30-059 Krakow, Poland
Bibliografia
- 1. D. Hemmerling, A. Skalski, J. Gajda; Voice data mining for laryngeal pathology assessment; Computers in Biology and Medicine 2015, 69, 270-276. DOI: doi.org/10.1016/j.compbiomed.2015.07.026
- 2. M. Vasilakis, Y. Stylianou; Voice pathology detection based on short-term jitter estimations in running speech; Folia phoniatrica et logopaedica: official organ of the International Association of Logopedics and Phoniatrics (IALP) 2009, 61(3), 153-70. DOI: 10.1159/000219951
- 3. H. Cordeiro, C. Meneses, J. Fonseca; Continuous speech classification systems for voice pathologies identification; IFIP Advances in Information and Communication Technology 2015, 450, 217-224.
- 4. W. Wszołek, A. Izworski, G. Izworski; Signal processing and analysis of pathological speech using artificial intelligence and learning systems methods; Acta Physica Polonica A 2013, 123(6), 995-1000.
- 5. Z. W. Engel, M. Kłaczyński, W. Wszołek; A vibroacoustic model of selected human larynx diseases; International Journal of Occupational Safety and Ergonomics 2007, 13(4), 367-379. DOI: 10.1080/10803548.2007.11105094
- 6. M. Madruga, Y. Campos-Roca, C. K. Pérez; Robustness Assessment of Automatic Reinke’s Edema Diagnosis Systems; Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, Barcelona, Spain, 2020.
- 7. Massachusetts Eye and Ear Infirmary; Voice disorders database, ver. 1.03. Kay Elemetrics Corp., Lincoln Park, New Jersey, USA, 1994.
- 8. G. Stegmann et al.; Repeatability of Commonly Used Speech and Language Features for Clinical Applications; Digital Biomarkers 2020, 4(3), 109-122. DOI: 10.1159/000511671
- 9. K. Kotarba, M. Kotarba; Voice pathology assessment using x-vectors approach; Vibrations in Physical Systems 2021, 32(1), 2021108. DOI: 10.21008/j.0860-6897.2021.1.08
- 10. L. Jeancolas et al.; X-vectors: New quantitative biomarkers for early Parkinson’s disease detection from speech; Front.Neuroinform. 2021, 15, Article 578369. DOI: 10.3389/fninf.2021.578369
- 11. C. Botelho et al.; Pathological speech detection using x-vector embeddings; arXiv: 2003.00864 [eess.AS] 2020. DOI: 10.48550/arXiv.2003.00864
- 12. W. Barry, M. Pützer; Saarbrücken voice database; Institute of Phonetics, Univ. of Saarland. Online. Available: http://www.stimmdatenbank.coli.uni-saarland.de
- 13. M. Kumar et al.; Designing neural speaker embeddings with meta learning; arXiv: 2007.16196 [eess.AS] 2020. DOI: 10.48550/arXiv.2007.16196
- 14. J. Deng et al.; ArcFace: Additive Angular Margin Loss for Deep Face Recognition; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, California, USA, 2019.
- 15. J. S. Chung, A. Nagrani, A. Zisserman; Voxceleb2: Deep speaker recognition; Proceedings of the Interspeech, Hyderabad, India, 2018.
- 16. F. Pedregosa et al.; Scikit-learn: Machine learning in Python; Journal of Machine Learning Research 2011, 12, 2825-2830.
- 17. N. Halko, P.-G. Martinsson, J. A. Tropp; Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions; SIAM Review 2011, 53(2), 217-288. DOI: 10.1137/090771806
- 18. T. Akiba et al.; Optuna: A next-generation hyperparameter optimization framework; Proceedings of the 25th International Conference on Knowledge Discovery and Data Mining, Anchorage, Alaska, USA, 2019.
- 19. J. Olczak et al.; Pesenting artificial intelligence, deep learning, and machine learning studies to clinicians and healthcare stakeholders: an introductory reference with a guideline and a Clinical AI Research (CAIR) checklist proposal; Acta Orthopaedica 2021, 92(5), 513-525. DOI: 10.1080/17453674.2021.1918389
- 20. J. Sidey-Gibbons, Ch. Sidey-Gibbons; Machine learning in medicine: a practical introduction; BMC Med. Res. Methodol., 2019, 19, Article 64. DOI: 10.1186/s12874-019-0681-4
- 21. T. Fawcett; An introduction to ROC analysis; Pattern Recognition Letters, 2006, 27(8), 861-874. DOI: 10.1016/j.patrec.2005.10.010
- 22. C. Calı̀, M. Longobardi; Some mathematical properties of the ROC curve and their applications; Ricerche di Matematica 2015, 64(2), 391-402.
- 23. R. Rao, G. Fung; On the dangers of cross-validation. An experimental evaluation; Proceedings of the SIAM International Conference on Data Mining, Atlanta, Georgia, USA, 2008.
- 24. S. Tabe-Bordbar et al.; A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models; Scientific Reports 2018, 8, Article 6620. DOI: 10.1038/s41598-018-24937-4
Uwagi
Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-f8bb71a7-c951-4d82-a55c-ee27873f83c2