PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Speech Enhancement Based on Discrete Wavelet Packet Transform and Itakura-Saito Nonnegative Matrix Factorisation

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Nonnegative matrix factorization (NMF) is one of the most popular machine learning tools for speech enhancement (SE). However, there are two problems reducing the performance of the traditional NMF-based SE algorithms. One is related to the overlap-and-add operation used in the short time Fourier transform (STFT) based signal reconstruction, and the other is the Euclidean distance used commonly as an objective function; these methods can cause distortion in the SE process. In order to get over these shortcomings, we propose a novel SE joint framework which combines the discrete wavelet packet transform (DWPT) and the Itakura-Saito nonnegative matrix factorisation (ISNMF). In this approach, the speech signal was first split into a series of subband signals using the DWPT. Then, the ISNMF was used to enhance the speech for each subband signal. Finally, the inverse DWPT (IDWT) was utilised to reconstruct these enhanced speech subband signals. The experimental results show that the proposed joint framework effectively enhances the performance of speech enhancement and performs better in the unseen noise case compared to the traditional NMF methods.
Rocznik
Strony
565--572
Opis fizyczny
Bibliogr. 40 poz., rys., wykr.
Twórcy
autor
  • School of Mechatronic Engineering, China University of Mining and Technology, Xuzhou 221116, China
autor
  • School of Mechatronic Engineering, China University of Mining and Technology, Xuzhou 221116, China
autor
  • School of Mechatronic Engineering, China University of Mining and Technology, Xuzhou 221116, China
autor
  • School of Mechatronic Engineering, China University of Mining and Technology, Xuzhou 221116, China
autor
  • School of Mechatronic Engineering, China University of Mining and Technology, Xuzhou 221116, China
autor
  • School of Mechatronic Engineering, China University of Mining and Technology, Xuzhou 221116, China
Bibliografia
  • 1. Bavkar S., Sahare S. (2013), PCA based single channel speech enhancement method for highly noisy environment, Proceedings of International Conference on Advances in Computing, pp. 1103-1107, Mysore, doi: 10.1109/ICACCI.2013.6637331.
  • 2. Boll S. (1979), Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Acoustics Speech & Signal Processing, 27 (2): 113-120, doi: 10.1109/TASSP.1979.1163209.
  • 3. Bouzid A., Ellouze N. (2016), Speech enhancement based on wavelet packet of an improved principal component analysis, Computer Speech & Language, 35: 58-72, doi: 10.1016/j.csl.2015.06.001.
  • 4. Chien J. T., Yang P. K. (2015), Bayesian factorization and learning for monaural source separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24 (1): 185-195, doi: 10.1109/TASLP.2015.2502141.
  • 5. Coifman R. R., Wickerhauser M. V. (1992), Entropy-based algorithms for best basis selection, IEEE Transactions on Information Theory, 38 (2): 713-718, doi: 10.1109/18.119732.
  • 6. Févotte C., Le Roux J., Hershey J. R. (2013), Non-negative dynamical system with application to speech and audio, Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3158-3162, Vancouver, doi: 10.1109/ICASSP.2013.6638240.
  • 7. Gokhale M., Khanduja D. K. (2010), Time domain signal analysis using wavelet packet decomposition approach, International Journal of Communications, Network and System Sciences, 3 (3): 321-329, doi: 10.4236/ijcns.2010.33041.
  • 8. Grancharov V., Samuelsson J., Kleijn B. (2006), On causal algorithms for speech enhancement, IEEE Transactions on Speech & Audio Processing, 14 (3): 764-773, doi: 10.1109/TSA.2005.857802.
  • 9. Hansen J. H., Pellom B. L. (1998), An effective quality evaluation protocol for speech enhancement algorithms, Proceedings of Fifth International Conference on Spoken Language Processing, pp. 0917-0921, Sydney.
  • 10. Islam M. S., Al Mahmud T. H., Khan W. U., Ye Z. (2019), Supervised single channel speech enhancement based on dual-tree complex wavelet transforms and nonnegative matrix factorization using the joint learning process and subband smooth ratio mask, Electronics, 8 (3): 353-371, doi: 10.3390/electronics8030353.
  • 11. Krawczyk-Becker M., Gerkmann T. (2016), An evaluation of the perceptual quality of phase-aware single-channel speech enhancement, Journal of the Acoustical Society of America, 140 (4): EL364-EL369, doi: 10.1121/1.4965288.
  • 12. Lai Y.-H., Chen F., Wang S.-S., Lu X., Tsao Y., Lee C.-H. (2016), A deep denoising autoencoder approach to improving the intelligibility of vocoded speech in cochlear implant simulation, IEEE Transactions on Biomedical Engineering, 64 (7): 1568-1578, doi: 10.1109/TBME.2016.2613960.
  • 13. Lee D. D., Seung H. S. (1999), Learning the parts of objects by non-negative matrix factorization, Nature, 401 (6755): 788-791, doi: 10.1038/44565.
  • 14. Lee S., Han D. K., Ko H. (2017), Single-channel speech enhancement method using reconstructive NMF with spectrotemporal speech presence probabilities, Applied Acoustics, 117: 257-262, doi: 10.1016/j.apacoust.2016.04.024.
  • 15. Li J., Sakamoto S., Hongo S., Akagi M., Suzuki Y. I. (2011), Two-stage binaural speech enhancement with Wiener filter for high-quality speech communication, Speech Communication, 53 (5): 677-689, doi: 10.1016/j.specom.2010.04.009.
  • 16. Li Y., Zhang X., Sun M. (2017), Robust Nonnegative matrix factorization with β-divergence for speech separation, ETRI Journal, 39 (1): 21-29, doi: 10.4218/etrij.17.0115.0122.
  • 17. Luts H. et al. (2010), Multicenter evaluation of signal enhancement algorithms for hearing aids, Journal of the Acoustical Society of America, 127 (3): 1491-1505, doi: 10.1121/1.3299168.
  • 18. Magron P., Virtane B. (2018), Expectation-maximization algorithms for Itakura-Saito nonnegative matrix factorization, Proceedings of 2018 Conference of the International Speech Communication Association (INTERSPEECH), pp. 856-860, Graz, doi: 10.21437/Interspeech.2018-1840.
  • 19. Mavaddaty S., Ahadi S. M., Seyedin S. (2017), Speech enhancement using sparse dictionary learning in wavelet packet transform domain, Computer Speech & Language, 44: 22-47, doi: 10.1016/j.csl.2017.01.009.
  • 20. Mohammadiha N., Smaragdis P., Leijon A. (2013), Supervised and unsupervised speech enhancement using nonnegative matrix factorization, IEEE Transactions on Audio, Speech, and Language Processing, 21 (10): 2140-2151, doi: 10.1109/TASL.2013.2270369.
  • 21. Mowlaee P., Saeidi R. (2014), Time-frequency constraints for phase estimation in single-channel speech enhancement, Proceedings of 2014 14th International Workshop on Acoustic Signal Enhancement, pp. 337-341, Juan-les-Pins, doi: 10.1109/IWAENC.2014.6954314.
  • 22. Nakano M., Kameoka H., Le Roux J., Kitano Y., Ono N., Sagayama S. (2010), Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with β-divergence, Proceedings of 2010 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pp. 283-288, Kittila, doi: 10.1109/MLSP.2010.5589233.
  • 23. Nie S., Shan L., Wenju L., Xueliang Z., Jianhua T. (2018), Deep learning based speech separation via NMF-style reconstructions, IEEE/ACM Transactions on Audio Speech & Language Processing, 26 (11): 2043-2055, doi: 10.1109/TASLP.2018.2851151.
  • 24. Panfili L. M., Haywood J., McCloy D. R., Souza P. E., Wright R. A. (2017), The UW/NU Corpus, Version 2.0, https://depts.washington.edu/phonlab/projects/uw-nu.php.
  • 25. Rix A. W., Beerends J. G., Hollier M. P., Hekstra A. P. (2001), Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, Proceedings of 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (ICASSP), pp. 749-752, Salt Lake City, doi: 10.1109/ICASSP.2001.941023.
  • 26. Saleem N., Khattak M. I. I., Ali M. Y., Shafi M. (2019), Deep neural network for supervised singlechannel speech enhancement, Archives of Acoustics, 44 (1): 3-12, doi: 10.24425/aoa.2019.126347.
  • 27. Saleem N., Khattak M. I., Shafi M. (2018), Unsupervised speech enhancement in low SNR environments via sparseness and temporal gradient regularization, Applied Acoustics, 141: 333-347, doi: 10.1016/j.apacoust.2018.07.027.
  • 28. Scalart P., Filho J. V. (1996), Speech enhancement based on a priori signal to noise estimation, Proceedings of 1996 IEEE International Conference on Acoustics, pp. 629-632, Atlanta, doi: 10.1109/ICASSP.1996.543199.
  • 29. Sun D. L., Fevotte C. (2014), Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence, Proceedings of 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (ICASSP), pp. 6201-6205, Florence, doi: 10.1109/ICASSP.2014.6854796.
  • 30. Sun P., Qin J. (2016), Wavelet packet transform based speech enhancement via two-dimensional SPP estimator with generalized gamma priors, Archives of Acoustics, 41 (3): 579-590, doi: 10.1515/aoa-2016-0056.
  • 31. Taal C. H., Hendriks R. C., Heusdens R., Jensen J. (2011), An algorithm for intelligibility prediction of time-frequency weighted noisy speech, IEEE Transactions on Audio, Speech, and Language Processing, 19 (7): 2125-2136, doi: 10.1109/TASL.2011.2114881.
  • 32. Varga A., Steeneken H. J. (1993), Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems, Speech Communication, 12 (3): 247-251, doi: 10.1016/0167-6393(93)90095-3.
  • 33. Varshney Y. V., Abbasi Z. A., Abidi M. R., Farooq O. (2017), Frequency selection based separation of speech signals with reduced computational time using sparse NMF, Archives of Acoustics, 42 (2): 287-295, doi: 10.1515/aoa-2017-0031.
  • 34. Veisi H., Sameti H., Aroudi A. (2015), Hidden Markov model-based speech enhancement using multivariate Laplace and Gaussian distributions, IET Signal Processing, 9 (2): 177-185, doi: 10.1049/ietspr.2014.0032.
  • 35. Wang D., Jiang M., Niu F., Cao Y., Zhou C. (2018a), Speech Enhancement Control Design Algorithm for Dual-Microphone Systems Using β-NMF in a Complex Environment, Complexity, 2018, Article ID 6153451, doi: 10.1155/2018/6153451.
  • 36. Wang D., Chen J. (2018), Supervised speech separation based on deep learning: An overview, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26 (10): 1702-1726, doi: 10.1109/TASLP.2018.2842159.
  • 37. Wang D., Hansen J. H. L. (2018), Speech enhancement for cochlear implant recipients, Journal of the Acoustical Society of America, 143 (4): 2244-2254, doi: 10.1121/1.5031112.
  • 38. Wang M., Zhang E., Tang Z. (2018b), Speech enhancement based on NMF under electric vehicle noise condition, IEEE Access, 6: 9147-9159, doi: 10.1109/ACCESS.2018.2797165.
  • 39. Wang S. S., Chern A., Tsao Y., Hung J. W., Lai Y. H., Su B. (2016), Wavelet speech enhancement based on nonnegative matrix factorization, IEEE Signal Processing Letters, 23 (8): 1101-1105, doi: 10.1109/LSP.2016.2571727.
  • 40. Wang S. S. et al. (2015), Improving denoising autoencoder based speech enhancement with the speech parameter generation algorithm, Proceedings of 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 365-369, Hong Kong, doi: 10.1109/APSIPA.2015.7415295.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-c45bc411-8bc3-4b72-8fa4-52339090735d
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.