Wyniki wyszukiwania - BazTech

1

Computational intelligence for speech enhancement using deep neural network

Hepsiba D., Justin Judith

Computer Assisted Methods in Engineering and Science

|

2022

|

Vol. 29, no. 1-2 spec.

71--85

EN

In real time, the speech signal received contains noise produced in the background andreverberations. These disturbances reduce the quality of speech; therefore, it is importantto eliminate the noise and increase the intelligibility and quality of speech signal. Speechenhancement is the primary task in any real-time application that handles speech signals.In the proposed method, the most effective and challenging noise, i.e., babble noise, isremoved, and the clean speech is recovered. The enhancement of the corrupted speechsignal is done by applying a deep neural network-based denoising algorithm in which theideal ratio mask is used to mask the noisy speech and separate the clean speech signal.In the proposed system, the speech signal corrupted by noise is enhanced. Evaluation ofenhanced speech signal by performance metrics such as short time objective intelligibilityand signal to noise ratio of the denoised speech show that the speech intelligibility andspeech quality are improved by the proposed method.

2

Voice disorder classification using speech enhancement and deep learning models

Chaiani Mounira, Selouani Sid Ahmed, Boudraa Malika, Sidi Yakoub Mohammed

Biocybernetics and Biomedical Engineering

|

2022

|

Vol. 42, no. 2

463--480

EN

With the recent development of speech-enabled interactive systems using artificial agents, there has been substantial interest in the analysis and classification of voice disorders to provide more inclusive systems for people living with specific speech and language impairments. In this paper, a two-stage framework is proposed to perform an accurate classification of diverse voice pathologies. The first stage consists of speech enhancement processing based on the original premise, which considers impaired voice as a noisy signal. To put this hypothesis into practice, the noise lestral harmonic-tonoise ratio (CHNR). The second stage consists of a convolutional neural network with long short-term memory (CNN-LSTM) architecture designed to learn complex features from spectrograms of the first-stage enhanced signals. A new sinusoidal rectified unit (SinRU) is proposed to be used as an activation function by the CNN-LSTM network. The experiments are carried out by using two subsets of the Saarbruecken voice database (SVD) with different etiologies covering eight pathologies. The first subset contains voice recordings of patients with vocal cordectomy, psychogenic dysphonia, pachydermia laryngis and frontolateral partial laryngectomy, and the second subset contains voice recordings of patients with vocal fold polyp, chronic laryngitis, functional dysphonia, and vocal cord paresis. Dysarthria severity levels identification in Nemours and Torgo databases is also carried out. The experimental results showed that using the minimum mean square error (MMSE)-based signal enhancer prior to the CNN-LSTM network using SinRU, led to a significant improvement in the automatic classification of the investigated voice disorders and dysarhtria severity levels. These findings support the hypothesis that using an appropriate speech enhancement preprocessing has positive effects on the accuracy of the automatic classification of voice pathologies thanks to the reduction of the intrinsic noise induced by the voice impairment.

3

The Use of Deep Learning in Speech Enhancement

Ram R., Mohanty M. N.

Annals of Computer Science and Information Systems

|

2018

|

Vol. 14

107--111

EN

Deep learning is an emerging area in current scenario. Mostly, Convolutional Neural Network (CNN) and Deep Belief Network (DBN) are used as the model in deep learning. It is termed as Deep Neural Network (DNN). The use of DNN is widely spread in many applications, exclusively for detection and classification purpose. In this paper, authors have used the same network for signal enhancement purpose. Speech is considered for the input signal with noise. The model of DNN is used with two layers. It has been compared with the ADALINE model to prove its efficacy.