Warianty tytułu
Języki publikacji
Abstrakty
More than 90% of patients with Parkinson’s disease suffer from hypokinetic dysarthria. This paper proposes a novel end-to-end deep learning model for Parkinson’s disease detection from speech signals. The proposed model extracts time series dynamic features using time-distributed two-dimensional convolutional neural networks (2D-CNNs), and then captures the dependencies between these time series using a one-dimensional CNN (1D-CNN). The performance of the proposed model was verified on two databases. On Database-1, the proposed model outperformed expert features-based machine learning models and achieved promising results, showing accuracies of 81.6% on the speech task of sustained vowel /a/ and 75.3% on the speech task of reading a short sentence (/si shi si zhi shi shi zi/) in Chinese. On Database-2, the proposed model was assessed on multiple sound types, including vowels, words, and sentences. An accuracy of up to 92% was obtained on the speech tasks, which included reading simple (/loslibros/) and complex (/viste/) sentences in Spanish. By visualizing the features generated by the model, it was found that the learned time series dynamic features are able to capture the characteristics of the reduced overall frequency range and reduced variability of Parkinson’s disease sounds, which are important clinical evidence for detecting Parkinson’s disease patients. The results also suggest that the low-frequency region of the Mel-spectrogram is more influential and important than the high-frequency region for Parkinson’s disease detection from speech.
Czasopismo
Rocznik
Tom
Strony
556--574
Opis fizyczny
Bibliogr. 53 poz., rys., tab., wykr.
Twórcy
autor
- Graduate School of System Informatics, Kobe University, 1-1, Rokkodai-cho, Nada-ku, Kobe, 657-8501 Kobe, Japan, quanchqin@gold.kobe-u.ac.jp
autor
- GYENNO Technologies CO., Ltd., Shenzhen, PR China
autor
- Graduate School of System Informatics, Kobe University, Kobe, Japan
autor
- GYENNO Technologies CO., Ltd., Shenzhen, PR China
autor
- GYENNO Technologies CO., Ltd., Shenzhen, PR China
Bibliografia
- [1] Dorsey ER, Sherer T, Okun MS, Bloem BR. The emerging evidence of the parkinson pandemic. J Parkinsons Dis 2018;8 (s1):S3–8. https://doi.org/10.3233/JPD-181474.
- [2] Braak H, Del Tredici K. Non-dopaminergic pathology of parkinson’s disease. In Olanow CW, Stocchi F, Lang AE, (Eds.) Parkinson’s Disease: Non-Motor and Non-Dopaminergic Features. Wiley-Blackwell; 2011. pp. 15–31. doi: 10.1002/ 9781444397970.ch3.
- [3] Moya-Galé G, Levy ES. Parkinson’s disease-associated dysarthria: prevalence, impact and management strategies. Res Rev Parkinsonism 2019;9:9–16. https://doi.org/10.2147/ JPRLS.S168090.
- [4] Smith KM, Caplan DN. Communication impairment in parkinson’s disease: Impact of motor and cognitive symptoms on speech and language. Brain Lang 2018;185:38–46. https://doi.org/10.1016/j.bandl.2018.08.002.
- [5] Auclair-Ouellet N, Lieberman P, Monchi O. Contribution of language studies to the understanding of cognitive impairment and its progression over time in parkinson’s disease. Neurosci Biobehav Rev 2017;80:657–72. https://doi.org/10.1016/j.neubiorev.2017.07.014.
- [6] Harel B, Cannizzaro M, Snyder P. Variability in fundamental frequency during speech in prodromal and incipient parkinson’s disease: A longitudinal case study. Brain Cogn 2004;56:24–9. https://doi.org/10.1016/j.bandc.2004.05.002.
- [7] Narendra NP, Schuller B, Alku P. The detection of parkinson’s disease from speech using voice source information. IEEE/ACM Trans Audio Speech Lang Process 2021;29:1925–36. https://doi.org/10.1109/TASLP.2021.3078364.
- [8] Tsanas A, Little MA, McSharry PE, Spielman J, Rami LO. Novel speech signal processing algorithms for high-accuracy classification of parkinson’s disease. IEEE Trans Biomed Eng 2012;59:1264–71. https://doi.org/10.1109/TBME.2012.2183367.
- [9] Sakar BE, Isenkul M, Sakar CO, Sertbas A, Gurgen F, Delil S, et al. Collection and analysis of a parkinson speech dataset with multiple types of sound recordings. IEEE J Biomed Health Inf 2013;17:828–34. https://doi.org/10.1109/JBHI.2013.2245674.
- [10] Lahmiri S, Dawson DA, Shmuel A. Performance of machine learning methods in diagnosing parkinson’s disease based on dysphonia measures. Biomed Eng Lett 2017;8:29–39. https:// doi.org/10.1007/s13534-017-0051-2.
- [11] Saloni RK, Gupta AK. Detection of parkinson disease using clinical voice data mining. Int J Circuits Syst Signal Process 2015;9:320–6.
- [12] García AM, Arias-Vergara T, Vasquez-Correa JC, Nöth E, Schuster M, Welch AE, et al. Cognitive determinants of dysarthria in parkinson’s disease: An automated machine learning approach. Mov Disord 2021;36:2862–73. https://doi. org/10.1002/mds.28751.
- [13] Zhang T, Zhang Y, Sun H, Shan H. Parkinson disease detection using energy direction features based on emd from voice signal. Biocybern Biomed Eng 2021;41:127–41. https:// doi.org/10.1016/j.bbe.2020.12.009.
- [14] Tuncer T, Dogan S, Acharya UR. Automated detection of parkinson’s disease using minimum average maximum tree and singular value decomposition method with vowels. Biocybern Biomed Eng 2020;40:211–20. https://doi.org/ 10.1016/j.bbe.2019.05.006.
- [15] Shahbakhi M, Far D, Tahami E. Speech analysis for diagnosis of parkinson’s disease using genetic algorithm and support vector machine. J Biomed Sci Eng 2014;7:147–56. https://doi. org/10.4236/jbise.2014.74019.
- [16] Meghraoui D, Boudraa B, Merazi-Meksen T, Boudraa M. Parkinson’s disease recognition by speech acoustic parameters classification. Model and Implementation Complex Syst 2016;1:160–73. https://doi.org/10.1007/978-3-319-33410-3_12.
- [17] Despotovic V, Skovranek T, Schommer C. Speech based estimation of parkinson’s disease using gaussian processes and automatic relevance determination. Neurocomputing 2020;401:173–81. https://doi.org/10.1016/j.neucom.2020.03.058.
- [18] Vaiciukynas E, Verikas A, Gelzinis A, Bacauskiene M. Detecting parkinson’s disease from sustained phonation and speech signals. PLoS One 2017;12. https://doi.org/10.1371/ journal.pone.0185613.
- [19] Karimi Rouzbahani H, Daliri MR. Diagnosis of parkinson’s disease in human using voice signals. Basic Clin Neurosci 2011;2:12–20.
- [20] Karan B, Sahu SS, Mahto K. Parkinson disease prediction using intrinsic mode function based features from speech signal. Biocybern Biomed Eng 2019;40:249–64. https://doi.org/10.1016/j.bbe.2019.05.005.
- [21] Karan B, Sahu SS, Orozco-Arroyave JR, Mahto K. Nonnegative matrix factorization-based time-frequency feature extraction of voice signal for parkinson’s disease prediction. Comput Speech Language 2021;69. doi: 10.1016/j. Csl.2021.101216.
- [22] Karan B, Sahu SS. An improved framework for parkinson’s disease prediction using variational mode decomposition-hilbert spectrum of speech signal. Biocybern Biomed Eng 2021;41:717–32. https://doi.org/10.1016/j.bbe.2021.04.014.
- [23] Karan B, Sahu SS, Orozco-Arroyave JR, Mahto K. Hilbert spectrum analysis for automatic detection and evaluation of parkinson’s speech. Biomed Signal Process Control 2020;61. doi: 10.1016/j.bspc.2020.102050.
- [24] Parisi L, RaviChandran N, Manaog ML. Feature-driven machine learning to improve early diagnosis of parkinson’s disease. Expert Syst Appl 2018;110:182–90. https://doi.org/10.1016/j.eswa.2018.06.003.
- [25] Gunduz H. Deep learning-based parkinson’s disease classification using vocal feature sets. IEEE Access 2019;7:115540–51. https://doi.org/10.1109/ ACCESS.2019.2936564.
- [26] Nagasubramanian G, Sankayya M. Multi-variate vocal data analysis for detection of parkinson disease using deep learning. Neural Comput Appl 2021;33:4849–64. https://doi.org/10.1007/s00521-020-05233-7.
- [27] Fang H, Gong C, Zhang C, Sui Y, Li L. Parkinsonian chinese speech analysis towards automatic classification of parkinson’s disease. In Proc. Machine Learning Research; 2020. pp. 114–125.
- [28] Sainath TN, Weiss RJ, Senior A, Wilson K, Vinyals O. Learning the speech front-end with raw waveform cldnns. Proc. INTERSPEECH 2015.
- [29] Quan C, Ren K, Luo Z. A deep learning based method for parkinson’s disease detection using dynamic features of speech. IEEE Access 2021;9:10239–52. https://doi.org/10.1109/ ACCESS.2021.3051432.
- [30] Vásquez-Correa JC, Arias-Vergara T, Orozco-Arroyave JR, Nöth E. A multitask learning approach to assess the dysarthria severity in patients with parkinson’s disease. Proc. INTERSPEECH 2018:456–60. https://doi.org/10.21437/ Interspeech.2018-1988.
- [31] Vásquez-Correa JC, Orozco-Arroyave JR, Nöth E. Convolutional neural network to model articulation impairments in patients with parkinson’s disease. Proc. INTERSPEECH 2017:314–8. https://doi.org/10.21437/Interspeech.2017-1078.
- [32] Vásquez-Correa JC, Arias-Vergara T, Orozco-Arroyave JR, Eskofier B, Klucken J, Nöth E. Multimodal assessment of parkinson’s disease: A deep learning approach. IEEE J Biomed Health Inf 2019;23:1618–30. https://doi.org/10.1109/ JBHI.2018.2866873.
- [33] Wodzinski M, Skalski A, Hemmerling D, Orozco-Arroyave JR, Nöth E. Deep learning approach to parkinson’s disease detection using voice recordings and convolutional neural network dedicated to image classification. In: Proc. 41st Annual International Conf. of the IEEE Engineering in Medicine and Biology Society. p. 717–20. https://doi.org/10.1109/EMBC.2019.8856972.
- [34] Fujita T, Luo Z, Quan C, Mori K, Cao S. Performance evaluation of rnn with hyperbolic secant in gate structure through application of parkinson’s disease detection. Appl Sci 2021;11. https://doi.org/10.3390/ app11104361.
- [35] Wang Z, Yan W, Oates T. Time series classification from scratch with deep neural networks: A strong baseline. In Proc. International Joint Conference on Neural Networks; 2017. pp. 1578–1585. doi: 10.1109/IJCNN.2017.7966039.
- [36] Zheng Y, Liu Q, Chen E, Ge Y, Leon Zhao L. Exploiting multi-channels deep convolutional neural networks for multivariate time series classification. Front Comput Sci 2016;10:96–112. https://doi.org/10.1007/ s11704-015-4478-2.
- [37] Cui Z, Chen W, Chen Y. Multi-scale convolutional neural networks for time series classification. arXiv 2016;abs/ 1603.06995.
- [38] Zhao B, Lu H, Chen S, Liu J, Wu D. Convolutional neural networks for time series classification. Syst Eng Electron 2017;28:162–9. https://doi.org/10.21629/JSEE.2017.01.18.
- [39] Ismail Fawaz H, Forestier G, Weber J. Deep learning for time series classification: a review. Data Min Knowl Disc 2019;33:917–63. https://doi.org/10.1007/s10618-019-00619-1.
- [40] Orozco-Arroyave JR, Arias-Londoño JD, Vargas-Bonilla JF, González-Rátiva MC, Nöth E. New spanish speech corpus database for the analysis of people suffering from parkinson’s disease. Proc. LREC 2014:342–7.
- [41] Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in python. J Mach Learn Res 2011;12:2825–30. URL: https://scikit-learn.org/stable/.
- [42] Orozco-Arroyave JR, Va´squez-Correa JC, Vargas-Bonilla JF, Arora R, Dehak N, Nidadavolu PS, et al. Neurospeech: An open-source software for parkinson’s speech analysis. Digital Signal Process 2018;77:207–21. https://doi.org/10.1016/j. Dsp.2017.07.004.
- [43] Lenain R, Weston J, Shivkumar A, Fristed E. Surfboard: Audio feature extraction for modern machine learning. ArXiv 2020;2005.08848.
- [44] Orozco-Arroyave JR, Belalcazar-Bolaños EA, Arias-Londoño JD, Vargas-Bonilla JF, Skodda S, Rusz J, et al. Characterization methods for the detection of multiple voice disorders: Neurological, functional, and laryngeal diseases. IEEE J Biomed Health Inf 2015;19:1820–8. https://doi.org/10.1109/ JBHI.2015.2467375.
- [45] Serrá J, Pascual S, Karatzoglou A. Towards a universal neural network encoder for time series. In Artificial Intelligence Research and Development; vol. 308. 2018. pp. 120–129. doi:10.3233/978-1-61499-918-8-120.
- [46] Lim W, Jang D, Lee T. Speech emotion recognition using convolutional and recurrent neural networks. In: Proc. Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). p. 1–4. https://doi. org/10.1109/APSIPA.2016.7820699.
- [47] Satt A, Rozenberg S, Hoory R. Efficient emotion recognition from speech using deep learning on spectrograms. Proc. INTERSPEECH 2017:1089–93. https://doi.org/10.21437/ Interspeech.2017-200.
- [48] Guo L, Wang L, Dang J, Zhang L, Guan H, Li X. Speech emotion recognition by combining amplitude and phase information using convolutional neural network. Proc. INTERSPEECH 2018:1611–5. https://doi.org/10.21437/Interspeech.2018-2156.
- [49] Mehmet BE, Esme I, Ibrahim I. Parkinson’s detection based on combined cnn and lstm using enhanced speech signals with variational mode decomposition. Biomed Signal Process Control 2021;70. https://doi.org/10.1016/j.bspc.2021.103006.
- [50] Autonomio talos. http://github.com/autonomio/talos; Accessed on: 10 Jan. 2021.
- [51] McFee B, McVicar M, Raffel C, Liang D, Nieto O, Moore J, et al. Librosa: v0.5.0. 2021, doi: 10.5281/zenodo.293021; Accessed on: 20 Jan.
- [52] Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-cam: Visual explanations from deep networks via gradient-based localization. Proc. ICCV 2017:618–26. https:// doi.org/10.1109/ICCV.2017.74.
- [53] Khare SK, Bajaj V, Acharya UR. Detection of parkinson’s disease using automated tunable q wavelet transform technique with eeg signals. Biocybern Biomed Eng 2021;41:679–89. https://doi.org/10.1016/j.bbe.2021.04.008.
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.baztech-b33e8232-faf2-48dd-a2b0-dd0967dee29a