PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Powiadomienia systemowe
  • Sesja wygasła!
Tytuł artykułu

Utilizing CNN architectures for non-invasive diagnosis of speech disorders - further experiments and insights

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
This research investigated the application of deep neural networks for diagnosing diseases that affect the voice and speech mechanisms through the non-invasive analysis of vowel sound recordings. Using the Saarbruecken Voice Database, the voice recordings were converted to spectrograms to train the models, specifically focusing on the vowels /a/, /u/, and /i/. The study used Explainable Artificial Intelligence (XAI) methodologies to identify essential features within these spectrograms for pathology identification, with the aim of providing medical professionals with enhanced insight into how diseases manifest in sound production. The F1 Score performance evaluation showed that the DenseNet model scored 0.70 ± 0.03 with a top of 0.74. The findings indicated that neither vowel selection nor data augmentation strategies significantly improved model performance. Additionally, the research highlighted that signal splitting was ineffective in enhancing the models’ ability to extract features. This study builds on our previous research [1], offering a more comprehensive understanding of the topic.
Twórcy
  • Wrocław University of Science and Technology, Wrocław, Poland
  • Institute of Data Science, Maastricht University, The Netherlands
autor
  • Wrocław University of Science and Technology, Wrocław, Poland
Bibliografia
  • [1] F. Ratajczak, M. Najda, and K. Szyc, Utilizing CNN Architectures for Non-invasive Diagnosis of Speech Disorders. Springer Nature Switzerland, 2024, p. 218-226. [Online]. Available: http://dx.doi.org/10.1007/978-3-031-61857-4 21
  • [2] J. Liu, Y. Pan, M. Li, Z. Chen, L. Tang, C. Lu, and J. Wang, “Applications of deep learning to mri images: A survey,” Big Data Mining and Analytics, vol. 1, no. 1, pp. 1-18, 2018.
  • [3] A. Esteva, A. Robicquet, B. Ramsundar, V. Kuleshov, M. DePristo, K. Chou, C. Cui, G. Corrado, S. Thrun, and J. Dean, “A guide to deep learning in healthcare,” Nature medicine, vol. 25, no. 1, pp. 24-29, 2019.
  • [4] R. Shrivastav, D. A. Eddins, and S. Anand, “Pitch strength of normal and dysphonic voices,” The Journal of the Acoustical Society of America, vol. 131, no. 3, pp. 2261-2269, 2012.
  • [5] R. Deepa, S. Arunkumar, V. Jayaraj, and A. Sivasamy, “Healthcare’s new frontier: Ai-driven early cancer detection for improved well-being,” AIP Advances, vol. 13, no. 11, Nov. 2023. [Online]. Available: http://dx.doi.org/10.1063/5.0177640
  • [6] O. Obulesu, N. Venkateswarulu, M. Sri Vidya, S. Manasa, K. Pranavi, and C. Brahmani, Early Prediction of Healthcare Diseases Using Machine Learning and Deep Learning Techniques. Springer Nature Singapore, 2023, p. 323-338. [Online]. Available: http://dx.doi.org/10.1007/978-981-99-1588-0 29
  • [7] M. Milling, F. B. Pokorny, K. D. Bartl-Pokorny, and B. W. Schuller, “Is speech the new blood? recent progress in ai-based disease detection from audio in a nutshell,” Frontiers in digital health, vol. 4, p. 886615, 2022.
  • [8] D. Hemmerling, M. W´ojcik-Pdziwiatr, P. Jaciów, B. Ziółko, and M. Igras-Cybulska, “Monitoring of parkinson’s disease progression based on speech signal,” in 2023 6th International Conference on Information and Computer Technologies (ICICT), 2023, pp. 132-137.
  • [9] M. L. Joshi and N. Kanoongo, “Depression detection using emotional artificial intelligence and machine learning: A closer review,” Materials Today: Proceedings, vol. 58, pp. 217-226, 2022.
  • [10] S. Hegde, S. Shetty, S. Rai, and T. Dodderi, “A survey on machine learning approaches for automatic detection of voice disorders,” Journal of Voice, vol. 33, no. 6, pp. 947-e11, 2019.
  • [11] L. van Bemmel, W. Harmsen, C. Cucchiarini, and H. Strik, Automatic Selection of the Most Characterizing Features for Detecting COPD in Speech. Springer International Publishing, 2021, p. 737-748. [Online]. Available: http://dx.doi.org/10.1007/978-3-030-87802-3 66
  • [12] M. K. Reddy, P. Helkkula, Y. M. Keerthana, K. Kaitue, M. Minkkinen, H. Tolppanen, T. Nieminen, and P. Alku, “The automatic detection of heart failure using speech signals,” Computer Speech & Language, vol. 69, p. 101205, 2021.
  • [13] R. Monir, D. Kostrzewa, and D. Mrozek, “Singing voice detection: a survey,” Entropy, vol. 24, no. 1, p. 114, 2022.
  • [14] B. Sisman, J. Yamagishi, S. King, and H. Li, “An overview of voice conversion and its challenges: From statistical modeling to deep learning,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 132-157, 2020.
  • [15] A. Bakhshi, A. Harimi, and S. Chalup, “Cytex: Transforming speech to textured images for speech emotion recognition,” Speech Communication, vol. 139, pp. 62-75, 2022.
  • [16] E. Keller, “The analysis of voice quality in speech processing,” International School on Neural Networks, Initiated by IIASS and EMFCSC, pp. 54-73, 2004.
  • [17] J. C. B. Gamboa, “Deep learning for time-series analysis,” arXiv preprint arXiv:1701.01887, 2017.
  • [18] J. Li, A. Mohamed, G. Zweig, and Y. Gong, “Lstm time and frequency recurrence for automatic speech recognition,” in 2015 IEEE workshop on automatic speech recognition and understanding (ASRU). IEEE, 2015, pp. 187-191.
  • [19] R. V. Sharan, H. Xiong, and S. Berkovsky, “Benchmarking audio signal representation techniques for classification with convolutional neural networks,” Sensors, vol. 21, no. 10, p. 3434, 2021.
  • [20] S. Seo, C. Kim, and J.-H. Kim, “Convolutional neural networks using log mel-spectrogram separation for audio event classification with unknown devices,” Journal of Web Engineering, pp. 497-522, 2022.
  • [21] M. Huzaifah, “Comparison of time-frequency representations for environmental sound classification using convolutional neural networks,” arXiv preprint arXiv:1706.07156, 2017.
  • [22] T. Bartosch, D. Seidl et al., “Spectrogram analysis of selected tremor signals using short-time fourier transform and continuous wavelet transform,” 1999.
  • [23] P. Abdzadeh and H. Veisi, “A comparison of cqt spectrogram with stft-based acoustic features in deep learning-based synthetic speech detection,” Journal of AI and Data Mining, vol. 11, no. 1, pp. 119-129, 2023.
  • [24] Z. Li, F. Liu, W. Yang, S. Peng, and J. Zhou, “A survey of convolutional neural networks: analysis, applications, and prospects,” IEEE transactions on neural networks and learning systems, 2021.
  • [25] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014.
  • [26] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
  • [27] M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convolutional neural networks,” in International conference on machine learning. PMLR, 2019, pp. 6105-6114.
  • [28] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” 2023. [Online]. Available: https://arxiv.org/abs/1706.03762
  • [29] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Transformers for image recognition at scale,” 2021. [Online]. Available: https://arxiv.org/abs/2010.11929
  • [30] Y. Gong, Y.-A. Chung, and J. Glass, “Ast: Audio spectrogram transformer,” 2021. [Online]. Available: https://arxiv.org/abs/2104.01778
  • [31] S. University, “Saarbruecken voice database,” database of voice recordings for speech and voice disorders research. [Online]. Available: https://stimmdb.coli.uni-saarland.de/help en.php4
  • [32] L. Vavrek, M. Hires, D. Kumar, and P. Drotár, “Deep convolutional neural network for detection of pathological speech,” in 2021 IEEE 19th World Symposium on Applied Machine Intelligence and Informatics (SAMI), 2021, pp. 000 245-000 250.
  • [33] F. T. Al-Dhief, M. M. Baki, N. M. A. Latiff, N. N. N. A. Malik, N. S. Salim, M. A. A. Albader, N. M. Mahyuddin, and M. A. Mohammed, “Voice pathology detection and classification by adopting online sequential extreme learning machine,” IEEE Access, vol. 9, pp. 77 293-77 306, 2021.
  • [34] H. Ding, Z. Gu, P. Dai, Z. Zhou, L. Wang, and X. Wu, “Deep connected attention (dca) resnet for robust voice pathology detection and classification,” Biomedical Signal Processing and Control, vol. 70, p. 102973, 2021.
  • [35] J.-Y. Lee, “Experimental evaluation of deep learning methods for an intelligent pathological voice detection system using the saarbruecken voice database,” Applied Sciences, vol. 11, no. 15, p. 7149, Aug. 2021. [Online]. Available: http://dx.doi.org/10.3390/app11157149
  • [36] R.-K. Sheu and M. S. Pardeshi, “A survey on medical explainable ai (xai): Recent progress, explainability approach, human interaction and scoring system,” Sensors, vol. 22, no. 20, p. 8068, 2022.
  • [37] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-cam: Visual explanations from deep networks via gradient-based localization,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 618-626.
  • [38] H. Wang, M. Du, F. Yang, and Z. Zhang, “Score-cam: Improved visual explanations via score-weighted class activation mapping,” CoRR, vol. abs/1910.01279, 2019. [Online]. Available: http://arxiv.org/abs/1910.01279
  • [39] H. G. Ramaswamy et al., “Ablation-cam: Visual explanations for deep convolutional network via gradient-free localization,” in proceedings of the IEEE/CVF winter conference on applications of computer vision, 2020, pp. 983-991.
  • [40] Y. Naqvi and V. Gupta, Functional Voice Disorders. StatPearls Publishing, Treasure Island (FL), 2023. [Online]. Available: http://europepmc.org/books/NBK563182
  • [41] Y. Hwang, H. Cho, H. Yang, D.-O. Won, I. Oh, and S.-W. Lee, “Mel-spectrogram augmentation for sequence to sequence voice conversion,” arXiv preprint arXiv:2001.01401, 2020. [Online]. Available: https://arxiv.org/abs/2001.01401
  • [42] G. Huang, Z. Liu, and K. Q. Weinberger, “Densely connected convolutional networks,” CoRR, vol. abs/1608.06993, 2016. [Online]. Available: http://arxiv.org/abs/1608.06993
  • [43] J. Xu, Y. Pan, X. Pan, S. Hoi, Z. Yi, and Z. Xu, “Regnet: Self-regulated network for image classification,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 11, pp. 9562-9567, 2022.
  • [44] H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. J´egou, “Training data-efficient image transformers distillation through attention,” 2021. [Online]. Available: https://arxiv.org/abs/2012.12877
  • [45] R. Jegan and R. Jayagowri, “Voice pathology detection using optimized convolutional neural networks and explainable artificial intelligence-based analysis,” Computer Methods in Biomechanics and Biomedical Engineering, pp. 1-17, 2023.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa nr POPUL/SP/0154/2024/02 w ramach programu "Społeczna odpowiedzialność nauki II" - moduł: Popularyzacja nauki (2025).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-bb755149-8e89-4123-9d2f-c9f2189d833e
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.