PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Voice pathology assessment using x-vectors approach

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Voice pathology assessment using sustained vowels has proven to be effective and reliable. However, only a few studies regarding detection of pathological speech based on continuous speech are available. In this study we evaluate the usefulness of various regression models trained on continuous speech recordings from Saarbruecken Voice Database in the detection of voice pathologies. The recordings were used for extraction of speaker embeddings called x-vectors based on mel-frequency cepstral coefficients and gammatone frequency cepstral coefficients. Since the dataset used in this study is imbalanced, various over- and undersampling techniques were applied to the training set to ensure robustness of models’ decision boundaries. The models were trained on both imbalanced and resampled training sets using 5-fold cross-validation. The best results were obtained for Multi Layer Perceptron trained on GFCC-based x-vectors, achieving accuracy of 0.8184, F1-score of 0.8212, and ROC AUC score of 0.8810 for the testing set.
Rocznik
Strony
art. no. 2021108
Opis fizyczny
Bibliogr. 50 poz., il. kolor.
Twórcy
  • AGH University of Science and Technology, Department of Mechanics and Vibroacoustics, al. Mickiewicza 30, 30-059 Kraków
  • TECHMO Voice Technologies, ul. Torfowa 1/5, 30-384 Kraków
Bibliografia
  • 1. B. Anthony Jnr. Use of telemedicine and virtual care for remote treatment in response to COVID-19 pandemic. Journal of Medical Systems, 44(7):132, 2020.
  • 2. R. Philips, N. Seim, L. Matrka, B. Locklear et al. Cost savings associated with an outpatient otolaryngology telemedicine clinic. Laryngoscope Investigative Otolaryngology, 4(2):234-240, 2019.
  • 3. D. Hemmerling, A. Skalski, J. Gajda. Voice data mining for laryngeal pathology assessment. Computers in Biology and Medicine, 69:270-276, 2016.
  • 4. S.-H. Fang, Y. Tsao, M.-J. Hsiao, J.-Y. Chen et al. Detection of pathological voice using cepstrum vectors: A deep learning approach. Journal of Voice, 33(5):634-641, 2019.
  • 5. A. Al-Nasheri, G. Muhammad, M. Alsulaiman, Z. Ali et al. An investigation of multidimensional voice program parameters in three different databases for voice pathology detection and classification. Journal of Voice, 31(1):113.e9-113.e18, 2017.
  • 6. M. Nicastri, G. Chiarella, L. Gallo, M. Catalano et al. Multidimensional voice program (MDVP) and amplitude variation parameters in euphonic adult subjects. Normative study. Acta otorhinolaryngologica Italica: organo ufficiale della Società italiana di otorinolaringologia e chirurgia cervico-facciale, 24(6):337-341, 2004.
  • 7. M. Vasilakis, Y. Stylianou. Voice pathology detection based on short-term jitter estimations in running speech. Folia phoniatrica et logopaedica: official organ of the International Association of Logopedics and Phoniatrics (IALP), 61(3):153-170, 2009.
  • 8. H. Cordeiro, C. Meneses, J. Fonseca. Continuous speech classification systems for voice pathologies identification. IFIP Advances in Information and Communication Technology, 450:217-224, 2015.
  • 9. V. Guedes, F. Teixeira, A. Oliveira, J. Fernandes et al. Transfer learning with audioset to voice pathologies identification in continuous speech. Procedia Computer Science, 164:662-669, 2019.
  • 10. W. Wszołek, A. Izworski, G. Izworski. Signal processing and analysis of pathological speech using artificial intelligence and learning systems methods. Acta Physica Polonica. A, 123(6):995-1000, 2013.
  • 11. Z. W. Engel, M. Kłaczyński, W. Wszołek. A vibroacoustic model of selected human larynx diseases. International Journal of Occupational Safety and Ergonomics, 13(4):367-379, 2007.
  • 12. D. Snyder, D. Garcia-Romero, D. Povey, S. Khudanpur. Deep neural network embeddings for text-independent speaker verification. INTERSPEECH, 999-1003, 2017.
  • 13. L. Jeancolas, D. Petrovska-Delacrétaz, G. Mangone, B.-E. Benkelfat et al. X-vectors: New quantitative biomarkers for early Parkinson’s disease detection from speech. Front.Neuroinform, 15, 2021.
  • 14. C. Botelho, F. Teixeira, T. Rolland, A. Abad et al. Pathological speech detection using x-vector embeddings, arXiv, 2003.00864, 2020.
  • 15. W. Barry, M. Pützer. Saarbrücken voice databse. Institute of Phonetics, Univ. of Saarland. [Online]. Available: http://www.stimmdatenbank.coli.uni-saarland.de
  • 16. D. Snyder, D. Garcia-Romero, G. Sell, D. Povey et al. X-vectors: Robust DNN embeddings for speaker recognition. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP):5329-5333, 2018.
  • 17. K. Wu, D. Zhang, G. Lu, Zh. Guo. Influence of sampling rate on voice analysis for assessment of Parkinson's disease. The Journal of the Acoustical Society of America, 144:1416, 2018.
  • 18. P. Grill, J. Tučková. Speech databases of typical children and children with SLI. PLoS ONE, 11(3): e0150365, 2016.
  • 19. M. Farooq, F. Adeeba, S. Hussain. X-vectors based Urdu speaker identification for short utterances. Conference of the Oriental COCOSDA International Committee for the Coordination and Standardisation of Speech Databases and Assessment Techniques, 2019.
  • 20. Y. Shao, Z. Jin, D. Wang, S. Srinivasan. An auditory-based feature for robust speech recognition. IEEE International Conference on Acoustics, Speech and Signal Processing: 4625-4628, 2009.
  • 21. D. Povey, A. Ghoshal, G. Boulianne, L. Burget et al. The Kaldi speech recognition toolkit. IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, 2011.
  • 22. M. Kumar, T. Jin-Park, S. Bishop, C. Lord et al. Designing neural speaker embeddings with meta learning. 2020.
  • 23. A. A. Chaudhari, S. B. Dhonde. Effect of varying MFCC filters for speaker recognition. International Journal of Computer Applications, 128(14), 2015.
  • 24. J. S. Chung, A. Nagrani, A. Zisserman. Voxceleb2: Deep speaker recognition. Proc. Interspeech 2018, 1086-1090, 2018.
  • 25. P. K. Ghosh, A. Tsiartas, S. Narayanan. Robust voice activity detection using long-term signal variability. IEEE Transactions on Audio, Speech, and Language Processing 19(3):600-613, 2011.
  • 26. Featxtra toolbox for Kaldi [Online]. Available: https://github.com/mvansegbroeck-zz/featxtra
  • 27. A. Paszke, S. Gross, F. Massa, A. Lerer et al. Pytorch: An imperative style, high-performance deep learning library,” in Advances in Neural Information Processing Systems 32:8024-8035, 2019.
  • 28. B. Krawczyk. Learning from imbalanced data: Open challenges and future directions. Progress in Artificial Intelligence 5, 2016.
  • 29. N. Chawla, K. Bowyer, L. Hall, W. Kegelmeyer,. Smote: Synthetic minority over-sampling technique. J. Artif. Intell. Res. (JAIR), 16:321-357, 2002.
  • 30. H. Nguyen, E. Cooper, K. Kamei. Borderline over-sampling for imbalanced data classification. International Journal of Knowledge Engineering and Soft Data Paradigm, 3:4-21, 2011.
  • 31. H. Han, W.-Y. Wang, B.-H. Mao. Borderline-smote: A new over-sampling method in imbalanced data sets learning. Adv. Intell. Comput., 3644:878-887, 2005.
  • 32. F. Last, G. Douzas, F. Bação. Oversampling for imbalanced learning based on k-means and smote. arXiv, 2017.
  • 33. H. He, Y. Bai, E. Garcia, S. Li, ADASYN: Adaptive synthetic sampling approach for imbalanced learning. Proceedings of the International Joint Conference on Neural Networks,1322 - 1328, 2008.
  • 34. M. Kubat. Addressing the curse of imbalanced training sets: One-sided selection. Fourteenth International Conference on Machine Learning, 2000.
  • 35. J. Laurikkala. Improving identification of difficult small classes by balancing class distribution. Proc. 8th Conf AI Med Eur Artif Intell Med:63-66, 2001.
  • 36. P. Hart. The condensed nearest neighbor rule. IEEE Transactions on Information Theory, 14(3):515-516, 1968.
  • 37. D. L. Wilson. Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, and Cybernetics, SMC-2(3):408-421, 1972.
  • 38. I. Tomek. An experiment with the edited nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics, SMC-6(6):448-452, 1976.
  • 39. J. Zhang, I. Mani. KNN Approach to Unbalanced Data Distributions: A Case Study Involving Information Extraction. Proceedings of the ICML’2003 Workshop on Learning from Imbalanced Datasets, 2003.
  • 40. I. Tomek. Two modifications of CNN. IEEE Transactions on Systems, Man, and Cybernetics, SMC-6(11): 769-772, 1976.
  • 41. Y.-P. Zhang, L.-N. Zhang, Y.-C. Wang. Cluster-based majority under-sampling approaches for class imbalance learning. IEEE International Conference on Information and Financial Engineering, 2010.
  • 42. G. Lemaı̂tre, F. Nogueira, C. K. Aridas. Imbalanced-learn: A Python toolbox to tackle the curse of imbalanced datasets in machine learning. Journal of Machine Learning Research, 18(17):1-5, 2017.
  • 43. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel et al. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825-2830, 2011.
  • 44. T. Akiba, S. Sano, T. Yanase, T. Ohta et al. Optuna: A next-generation hyperparameter optimization framework. 25rd International Conference on Knowledge Discovery and Data Mining, 2019.
  • 45. T. Fawcett. An introduction to ROC analysis. Pattern Recognition Letters, 27:861-874, 2006.
  • 46. C. Calı̀, M. Longobardi. Some mathematical properties of the ROC curve and their applications. Ricerche di Matematica, 64:391-402, 2015.
  • 47. S. Tabe-Bordbar, A. Emad, S. Zhao, S. Sinha. A closer look at cross-validation for assessing the accuracy of gene regulatory networks and models. Scientific Reports, 8:6620, 2018.
  • 48. R. Rao, G. Fung. On the dangers of cross-validation. An experimental evaluation. SIAM International Conference on Data Mining, 588-596, 2008.
  • 49. J. P. Teixeira, P. O. Fernandes, N. Alves. Vocal acoustic analysis - classification of dysphonic voices with artificial neural networks. Procedia Computer Science, 121:19-26, 2017.
  • 50. Massachusetts Eye and Ear Infirmary. Voice disorders database, ver. 1.03. Kay Elemetrics Corp., Lincoln Park, NJ, 1994.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-1ad5ff31-a10a-4c29-a2aa-119cbce53708
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.