PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Speech Enhancement Using Sliding Window Empirical Mode Decomposition and Hurst-based Technique

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
The most challenging in speech enhancement technique is tracking non-stationary noises for long speech segments and low Signal-to-Noise Ratio (SNR). Different speech enhancement techniques have been proposed but, those techniques were inaccurate in tracking highly non-stationary noises. As a result, Empirical Mode Decomposition and Hurst-based (EMDH) approach is proposed to enhance the signals corrupted by non-stationary acoustic noises. Hurst exponent statistics was adopted for identifying and selecting the set of Intrinsic Mode Functions (IMF) that are most affected by the noise components. Moreover, the speech signal was reconstructed by considering the least corrupted IMF. Though it increases SNR, the time and resource consumption were high. Also, it requires a significant improvement under nonstationary noise scenario. Hence, in this article, EMDH approach is enhanced by using Sliding Window (SW) technique. In this SWEMDH approach, the computation of EMD is performed based on the small and sliding window along with the time axis. The sliding window depends on the signal frequency band. The possible discontinuities in IMF between windows are prevented by the total number of modes and the number of sifting iterations that should be set a priori. For each module, the number of lifting iterations is determined by decomposition of many signal windows by standard algorithm and calculating the average number of sifting steps for each module. Based on this approach, the time complexity is reduced significantly with suitable quality of decomposition. Finally, the experimental results show the considerable improvements in speech enhancement under non-stationary noise environments.
Rocznik
Strony
429--437
Opis fizyczny
Bibliogr. 21 poz., rys., tab., wykr.
Twórcy
  • Department of Computer Science, Bharathiar University, Coimbatore, India
  • Department of Computer Science, Bharathiar University, Coimbatore, India
Bibliografia
  • 1. Chatlani N., Soraghan J. J. (2012), EMD-based filtering (EMDF) of low-frequency noise for speech enhancement, IEEE Transactions on Audio, Speech, and Language Processing, 20, 4, 1158-1166.
  • 2. Dwijayanti S., Yamamori K., Miyoshi M. (2018), Enhancement of speech dynamics for voice activity detection using DNN, EURASIP Journal on Audio, Speech, and Music Processing, 2018, 10, 15 pages.
  • 3. Gerkmann T., Hendriks R. C. (2012), Unbiased MMSE-based noise power estimation with low complexity and low tracking delay, IEEE Transactions on Audio, Speech, and Language Processing, 20, 4, 1383-1393.
  • 4. Ghahabi O., Zhou W., Fischer V. (2018), A robust voice activity detection for real-time automatic speech recognition, [in:] Proceedings of ESSV 2018, Ulm, Germany.
  • 5. Hawaldar S., Dixit M. (2011), Speech enhancement for non-stationary noise environments, Signal Image Processing, 2, 4, 129-136.
  • 6. Ji Y., Baek Y., Park Y. C. (2017a), Robust noise power spectral density estimation for binaural speech enhancement in time-varying diffuse noise field, EURASIP Journal on Audio, Speech, and Music Processing, 2017, 1, 25.
  • 7. Jin Y. G., Shin J. W., Kim N. S. (2017b), Decision-directed speech power spectral density matrix estimation for multichannel speech enhancement, The Journal of the Acoustical Society of America, 141, 3, EL228-EL233.
  • 8. Kasap C., Arslan M. L. (2013), A unified approach to speech enhancement and voice activity detection, Turkish Journal of Electrical Engineering Computer Sciences, 21, 2, 527-547.
  • 9. Khaldi K., Boudraa A. O., Komaty A. (2014), Speech enhancement using empirical mode decomposition and the Teager-Kaiser energy operator, The Journal of the Acoustical Society of America, 135, 1, 451-459.
  • 10. Kulkarni D. S., Deshmukh R. R., Shrishrimal P. P. (2016), A review of speech signal enhancement techniques, International Journal of Computer Applications, 139, 14, 23-26.
  • 11. Mai V. K., Pastor D., Aïssa-El-Bey A., Le-Bidan R. (2015), Robust estimation of on-stationary noise power spectrum for speech enhancement, IEEE Transactions on Audio, Speech, and Language Processing, 23, 4, 670-682.
  • 12. Mandic D. P., Rehman N. U., Wu Z., Huang N. E. (2013), Empirical mode decomposition-based time-frequency analysis of multivariate signals: the power of adaptive data analysis, IEEE Signal Processing Magazine, 30, 6, 74-86.
  • 13. Mert A., Akan A. (2014), Detrended fluctuation thresholding for empirical mode decomposition based denoising, Digital Signal Processing, 32, 48-56.
  • 14. Soni M. H., Shah N., Patil H. A. (2018), Time-frequency masking-based speech enhancement using generative adversarial network, [in:] 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5039-5043.
  • 15. Taal C. H., Hendriks R. C., Heusdens R., Jensen J. (2011), An algorithm for intelligibility prediction of time-frequency weighted noisy speech, IEEE Transactions on Audio, Speech, and Language Processing, 19, 7, 2125-2136.
  • 16. wa Maina C., Walsh J. M. (2011), Joint speech enhancement and speaker identification using approximate Bayesian inference, IEEE Transactions on Audio, Speech, and Language Processing, 19, 6, 1517-1529.
  • 17. Zao L., Coelho R. (2011), Colored noise based multicondition training technique for robust speaker identification, IEEE Signal Processing Letters, 18, 11, 675-678.
  • 18. Zao L., Coelho R., Flandrin P. (2014), Speech enhancement with EMD and hurst-based mode selection, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 5, 899-911.
  • 19. Zeiler A., Faltermeier R., Keck I. R., Tomé A. M., Puntonet C. G., Lang E. W. (2010), Empirical mode decomposition – an introduction, [in:] 2010 IEEE International Joint Conference on Neural Networks (IJCNN), pp. 1-8, 18-23 July, Barcelona, Spain.
  • 20. Zhang Y., Tang Z. M., Li Y. P., Luo Y. (2014), A hierarchical framework approach for voice activity detection and speech enhancement, The Scientific World Journal, 2014, Article ID 723643, 8 pages.
  • 21. Zhao Y., Zhao X., Wang B. (2014), A speech enhancement method based on sparse reconstruction of power spectral density, Computers Electrical Engineering, 40, 4, 1080-1089.
Uwagi
Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-a2d63370-8b42-442b-a81a-a53a38a7331f
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.