PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Powiadomienia systemowe
  • Sesja wygasła!
  • Sesja wygasła!
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
With the recent development of speech-enabled interactive systems using artificial agents, there has been substantial interest in the analysis and classification of voice disorders to provide more inclusive systems for people living with specific speech and language impairments. In this paper, a two-stage framework is proposed to perform an accurate classification of diverse voice pathologies. The first stage consists of speech enhancement processing based on the original premise, which considers impaired voice as a noisy signal. To put this hypothesis into practice, the noise lestral harmonic-tonoise ratio (CHNR). The second stage consists of a convolutional neural network with long short-term memory (CNN-LSTM) architecture designed to learn complex features from spectrograms of the first-stage enhanced signals. A new sinusoidal rectified unit (SinRU) is proposed to be used as an activation function by the CNN-LSTM network. The experiments are carried out by using two subsets of the Saarbruecken voice database (SVD) with different etiologies covering eight pathologies. The first subset contains voice recordings of patients with vocal cordectomy, psychogenic dysphonia, pachydermia laryngis and frontolateral partial laryngectomy, and the second subset contains voice recordings of patients with vocal fold polyp, chronic laryngitis, functional dysphonia, and vocal cord paresis. Dysarthria severity levels identification in Nemours and Torgo databases is also carried out. The experimental results showed that using the minimum mean square error (MMSE)-based signal enhancer prior to the CNN-LSTM network using SinRU, led to a significant improvement in the automatic classification of the investigated voice disorders and dysarhtria severity levels. These findings support the hypothesis that using an appropriate speech enhancement preprocessing has positive effects on the accuracy of the automatic classification of voice pathologies thanks to the reduction of the intrinsic noise induced by the voice impairment.
Twórcy
  • Laboratory of Speech Communication and Signal Processing, University of Sciences and Technology Houari Boumediene, Algiers, Algeria
  • Research Laboratory in Human-System Interaction, Université de Moncton, Shippagan Campus, New Brunswick, Canada
  • Laboratory of Speech Communication and Signal Processing, University of Sciences and Technology Houari Boumediene, Algiers, Algeria
  • Research Laboratory in Human-System Interaction, Université de Moncton, Shippagan Campus, New Brunswick, Canada
Bibliografia
  • [1] American Speech-Language-Hearing Association, Definitions of communication disorders and variations [relevant paper], available from www.asha.org/policy (1993). URL:https://www. asha.org/policy/rp1993-00208/.
  • [2] Martins RHG, do Amaral HA, Tavares ELM, Martins MG, Gonçalves TM, Dias NH, Voice disorders: etiology and diagnosis, Journal of voice 30 (6) (2016) 761.e1–761.e9.
  • [3] American Speech-Language-Hearing Association, Voice disorders (practice portal), Retrieved (December, 30, 2021). URL:https://www.asha.org/Practice-Portal/Clinical-Topics/ Voice-Disorders/.
  • [4] Baker J. The role of psychogenic and psychosocial factors in the development of functional voice disorders. Int J Speechlanguage Pathol 2008;10(4):210–30.
  • [5] Chandrakala S, Rajeswari N. Representation learning based speech assistive system for persons with dysarthria. IEEE Trans Neural Syst Rehabil Eng 2017;25(9):1510–7.
  • [6] Muhammad G, Altuwaijri G, Alsulaiman M, Ali Z, Mesallam TA, Farahat M, Malki KH, Al-nasheri A. Automatic voice pathology detection and classification using vocal tract area irregularity. Biocybern Biomed Eng 2016;36(2):309–17.
  • [7] Al-nasheri A, Muhammad G, Alsulaiman M, Ali Z, Mesallam TA, Farahat M, Malki KH, Bencherif MA. An investigation of multidimensional voice program parameters in three different databases for voice pathology detection and classification. J Voice 2017;31(1):113.e9–113.e18.
  • [8] Al-Nasheri A, Muhammad G, Alsulaiman M, Ali Z, Malki KH, Mesallam TA, Ibrahim MF. Voice pathology detection and classification using auto-correlation and entropy features in different frequency regions. IEEE Access 2018;6:6961–74.
  • [9] Hammami I, Salhi L, Labidi S. Voice pathologies classification and detection using emd-dwt analysis based on higher order statistic features. IRBM 2020;41(3):161–71.
  • [10] Karan B, Sahu SS, Mahto K. Parkinson disease prediction using intrinsic mode function based features from speech signal, Biocybernetics and Biomedical. Engineering 2020;40 (1):249–64.
  • [11] Hossain MS, Muhammad G. Healthcare big data voice pathology assessment framework. IEEE Access 2016;4:7806–15.
  • [12] Ali Z, Elamvazuthi I, Alsulaiman M, Muhammad G. Automatic voice pathology detection with running speech by using estimation of auditory spectrum and cepstral coefficients based on the all-pole model. J Voice 2016;30 (6):757.e7–757.e19.
  • [13] Harar P, Galaz Z, Alonso-Hernandez JB, Mekyska J, Burget R, Smekal Z, Towards robust voice pathology detection, Neural Computing and Applications (2018) 1–11.
  • [14] Verde L, Pietro GD, Sannino G. Voice disorder identification by using machine learning techniques. IEEE Access 2018;6:16246–55.
  • [15] España-Bonet C, Fonollosa JAR. Automatic speech recognition with deep neural networks for impaired speech. In: Abad A, Ortega A, Teixeira A, Mateo CG, Hinarejos CDM, Perdigão F, Batista F, Mamede N, editors. Advances in Speech and Language Technologies for Iberian Languages. Cham: Springer International Publishing; 2016. p. 97–107.
  • [16] Zaidi BF, Selouani SA, Boudraa M, Yakoub MS. Deep neural network architectures for dysarthric speech analysis and recognition. Neural Comput Appl 2021:1–20.
  • [17] Alhussein M, Muhammad G. Voice pathology detection using deep learning on mobile healthcare framework. IEEE Access 2018;6:41034–41.
  • [18] Mohammed MA, Abdulkareem KH, Mostafa SA, Ghani MKA, Maashi MS, Garcia-Zapirain B, Oleagordia I, Alhakami H, ALDhief FT, Voice pathology detection and classification using convolutional neural network model, Applied Sciences 10 (11) (2020) 3723.
  • [19] Alhussein M, Muhammad G. Automatic voice pathology monitoring using parallel deep models for smart healthcare. IEEE Access 2019;7:46474–9.
  • [20] Harar P, Alonso-Hernandezy JB, Mekyska J, Galaz Z, Burget R, Smekal Z, Voice pathology detection using deep learning: a preliminary study, in: 2017 International Conference and Workshop on Bioinspired Intelligence (IWOBI), 2017, pp. 1–4.
  • [21] Wu H, Soraghan J, Lowit A, Di-Caterina G, A Deep Learning Method for Pathological Voice Detection Using Convolutional Deep Belief Networks, in: Proc. Interspeech 2018, 2018, pp. 446–450.
  • [22] Fang S-H, Tsao Y, Hsiao M-J, Chen J-Y, Lai Y-H, Lin F-C, Wang C-T. Detection of pathological voice using cepstrum vectors: A deep learning approach. J Voice 2019;33(5):634–41.
  • [23] Chen L, Chen J. Deep neural network for automatic classification of pathological voice signals. J Voice 2020.
  • [24] Kim H, Jeon J, Han YJ, Joo Y, Lee J, Lee S, Im S. Convolutional neural network classifies pathological voice change in laryngeal cancer with high accuracy. J Clinical Med 2020;9 (11):3415.
  • [25] Pützer M, Barry WJ. Saarbrüecken Voice Database, publisher: Institut für Phonetik. Universität des Saarlandes (May 2007. URL:http://www.stimmdatenbank.coli.uni-saarland.de/ help_en.php4.
  • [26] Menendez-Pidal X, Polikoff JB, Peters SM, Leonzio JE, Bunnell HT, The nemours database of dysarthric speech, in: Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP’96, Vol. 3, IEEE, 1996, pp. 1962– 1965.
  • [27] Rudzicz F, Namasivayam AK, Wolff T. The TORGO database of acoustic and articulatory speech from speakers with dysarthria. Language Resour Eval 2012;46(4):523–41.
  • [28] Hsu Y.-T, Zhu Z, Wang C.-T, Fang S.-H, Rudzicz F, Tsao Y, Robustness against the channel effect in pathological voice detection, CoRR abs/1811.10376 (2018). arXiv:1811.10376.
  • [29] Souli S, Amami R, Yahia S. B, A robust pathological voices recognition system based on DCNN and scattering transform. Appl Acoust 2021;177 107854.
  • [30] Bhat C, Das B, Vachhani B, Kopparapu SK, Dysarthric speech recognition using time-delay neural network based denoising autoencoder, in: Proc. Interspeech 2018, 2018, pp. 451–455.
  • [31] Yakoub MS, Selouani SA, Zaidi B-F, Bouchair A. Improving dysarthric speech recognition using empirical mode decomposition and convolutional neural network. EURASIP J Audio, Speech, Music Processing 2020;2020(1):1–7.
  • [32] Borrie SA, Baese-Berk M, Engen KV, Bent T. A relationship between processing speech in noise and dysarthric speech. J Acoust Soc Am 2017;141(6):4660–7.
  • [33] Stachler RJ, Francis DO, Schwartz SR, Damask CC, Digoy GP, Krouse HJ, McCoy SJ, Ouellette DR, Patel RR, Reavis CCW, Smith LJ, Smith M, Strode SW, Woo P, Nnacheta LC. Clinical practice guideline: Hoarseness (dysphonia) (update). Otolaryngology-Head Neck Surgery 2018;158(1_suppl):S1–S42.
  • [34] Gómez-García J, Moro-Velázquez L, Arias-Londoño J, Godino-Llorente J. On the design of automatic voice condition analysis systems. part iii: review of acoustic modelling strategies. Biomed Signal Process Control 2021;66 102049.
  • [35] Moers C, Möbius B, Rosanowski F, Nöth E, Eysholdt U, Haderlein T. Vowel-and text-based cepstral analysis of chronic hoarseness. J Voice 2012;26(4):416–24.
  • [36] Aronson AE, Bless D. Clinical Voice Disorders. Thieme: Thieme Publishers Series; 2009.
  • [37] Ephraim Y, Malah D. Speech enhancement using a minimum mean-square error log-spectral amplitude estimator. IEEE Trans Acoust Speech Signal Process 1985;33(2):443–5.
  • [38] Loizou P. Speech Enhancement: Theory and Practice, Signal processing and communications. Taylor & Francis; 2007.
  • [39] de Krom G. A cepstrum-based technique for determining a harmonics-to-noise ratio in speech signals. J Speech, Language, Hearing Res 1993;36(2):254–66.
  • [40] Nielsen JK, Jensen TL, Jensen JR, Christensen MG, Jensen SH. Fast fundamental frequency estimation: Making a statistically efficient estimator computationally efficient. Signal Processing 2017;135:188–97.
  • [41] Jana GC, Sharma R, Agrawal A. A 1d-cnn-spectrogram based approach for seizure detection from eeg signal. Procedia Computer Science 2020;167:403–12.
  • [42] Nair V, Hinton GE, Rectified linear units improve restricted boltzmann machines, in: ICML, 2010, pp. 807–814.
  • [43] Witten IH, Frank E, Hall MA, Pal CJ, Chapter 10 - deep learning, in: Witten I. H, Frank E, Hall M. A, Pal C.J (Eds.), Data Mining (Fourth Edition), fourth edition Edition, Morgan Kaufmann, 2017, pp. 417–466.
  • [44] Boureau Y-L, Ponce J, LeCun Y. A theoretical analysis of feature pooling in visual recognition. In: Proceedings of the 27th international conference on machine learning (ICML10). p. 111–8.
  • [45] Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. J Mach Learning Res 2014;15 (56):1929–58.
  • [46] Pützer M, Wokurek W. Electroglottographic and acoustic parametrization of phonatory quality provide voice profiles of pathological speakers. J Voice 2021.
  • [47] Sasaki CT, accessed: 2021-05-07. URL:https://www. merckmanuals.com/fr-ca/professional/affections-de-loreille,-du-nez-et-de-la-gorge/troubles-laryngiens/ulc%C3% A8res-de-contact-du-larynx.
  • [48] Mouawad F, Chevalier D, Santini L, Fakhry N, Bozec A, Espitalier F. Chapitre 9 - traitement chirurgical par cervicotomie et reconstruction laryngée. In: Barry B, Malard O, Morinière S, editors. Cancers du Larynx. Paris: Elsevier Masson; 2019. p. 89–115.
  • [49] Kingma DP, Ba J, Adam: A method for stochastic optimization (2017). arXiv:1412.6980.
  • [50] Hsieh T-A, Yu C, Fu S-W, Lu X, Tsao Y, Improving Perceptual Quality by Phone-Fortified Perceptual Loss Using Wasserstein Distance for Speech Enhancement, in: Proc. Interspeech 2021, 2021, pp. 196–200.
  • [51] Phan H, McLoughlin IV, Pham L, Chén OY, Koch P, Vos MD, Mertins A. Improving gans for speech enhancement. IEEE Signal Process Lett 2020;27:1700–4.
  • [52] Pouchoulin G, Fredouille C, Bonastre J-F, Ghio A, Giovanni A, Frequency study for the characterization of the dysphonic voices, in: Proc. Interspeech 2007, 2007, pp. 1198–1201.
  • [53] Hendrycks D, Gimpel K, Bridging nonlinearities and stochastic regularizers with gaussian error linear units, CoRR abs/1606.08415 (2016). arXiv:1606.08415.
  • [54] Klambauer G, Unterthiner T, Mayr A, Hochreiter S, Self-normalizing neural networks, in: Proceedings of the 31st international conference on neural information processing systems, 2017, pp. 972–981.
  • [55] Ramachandran P, Zoph B, Le QV, Searching for activation functions, CoRR abs/1710.05941 (2017). arXiv:1710.05941.
  • [56] Kim M, Kim Y, Yoo J, Wang J, Kim H. Regularized speaker adaptation of kl-hmm for dysarthric speech recognition. IEEE Trans Neural Syst Rehabil Eng 2017;25(9):1581–91.
  • [57] Kadi KL, Selouani SA, Boudraa B, Boudraa M. Fully automated speaker identification and intelligibility assessment in dysarthria disease using auditory knowledge. Biocybern Biomed Eng 2016;36(1):233–47.
  • [58] Guedes V, Teixeira F, Oliveira A, Fernandes J, Silva L, Junior A, Teixeira JP. Transfer learning with audioset to voice pathologies identification in continuous speech. Procedia Computer Science 2019;164:662–9.
  • [59] Yilmaz E, Mitra V, Bartels C, Franco H, Articulatory features for ASR of pathological speech (2018). arXiv:1807.10948.
  • [60] Oue S, Marxer R, Rudzicz F, Automatic dysfluency detection in dysarthric speech using deep belief networks, in: Proceedings of SLPAT 2015: 6th Workshop on Speech and Language Processing for Assistive Technologies, Association for Computational Linguistics, Dresden, Germany, 2015, pp. 60–64.
  • [61] Brückl M, Ghio A, Viallet F. Measurement of tremor in the voices of speakers with parkinson’s disease. Procedia Computer Science 2018;128:47–54.
  • [62] Suppa A, Asci F, Saggio G, Leo PD, Zarezadeh Z, Ferrazzano G, Ruoppolo G, Berardelli A, Costantini G. Voice analysis with machine learning: One step closer to an objective diagnosis of essential tremor. Mov Disord 2021;36(6):1401–10.
  • [63] Rusz J, Tykalova T, Ramig LO, Tripoliti E. Guidelines for speech recording and acoustic analyses in dysarthrias of movement disorders. Mov Disord 2021;36(4):803–14.
  • [64] Nilsson C, Nyberg J, Strömbergsson S. How are speech sound disorders perceived among children? A qualitative content analysis of focus group interviews with 1011-year-old children. Child Language Teaching Therapy 2021;37 (2):163–75.
  • [65] Lin F-C, Chien H-Y, Kao Y-C, Wang C-T. Multi-dimensional investigation of the clinical effectiveness and prognostic factors of voice therapy for benign voice disorders. J Formos Med Assoc 2021.
  • [66] Suppa A, Asci F, Saggio G, Marsili L, Casali D, Zarezadeh Z, Ruoppolo G, Berardelli A, Costantini G. Voice analysis in adductor spasmodic dysphonia: Objective diagnosis and response to botulinum toxin. Parkinsonism Related Disorders 2020;73:23–30.
  • [67] Khan T, Westin J, Dougherty M. Classification of speech intelligibility in parkinson’s disease. Biocybern Biomed Eng 2014;34(1):35–45.
  • [68] Zezario RE, Huang J-W, Lu X, Tsao Y, Hwang H-T, Wang H-M, Deep denoising autoencoder based post filtering for speech enhancement, in: 2018 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), 2018, pp. 373–377.
  • [69] Zhang Q, Nicolson A, Wang M, Paliwal KK, Wang C. Deepmmse: A deep learning approach to mmse-based noise power spectral density estimation. IEEE/ACM Trans Audio, Speech, Language Processing 2020;28:1404–15.
  • [70] Xiong F, Barker J, Christensen H. Phonetic analysis of dysarthric speech tempo and applications to robust personalised dysarthric speech recognition. In: ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). p. 5836–40.
  • [71] Utianski RL, Sandoval S, Berisha V, Lansford KL, Liss JM. The effects of speech compression algorithms on the intelligibility of two individuals with dysarthric speech. Am J Speech-Language Pathology 2019;28(1):195–203.
  • [72] Patel RR, Awan SN, Barkmeier-Kraemer J, Courey M, Deliyski D, Eadie T, Paul D, vec JG, Hillman R. Recommended protocols for instrumental assessment of voice: American speech-language-hearing association expert panel to develop a protocol for instrumental assessment of vocal function. Am J Speech-Language Pathology 2018;27(3):887–905.
  • [73] Corcoran C, Cecchi G. Using language processing and speech analysis for the identification of psychosis and other disorders. Biological Psychiatry: Cognitive Neuroscience Neuroimaging 2020;5(8):770–9.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-4401d297-63de-4d3b-ab3c-2b1660529f3e
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.