Speech Enhancement Based on Discrete Wavelet Packet Transform and Itakura-Saito Nonnegative Matrix Factorisation

Liu, Houguang; Wang, Wenbo; Xue, Lin; Yang, Jianhua; Wang, Zhihua; Hua, Chunli

doi:10.24425/aoa.2020.134072

Artykuł - szczegóły

Tytuł artykułu

Speech Enhancement Based on Discrete Wavelet Packet Transform and Itakura-Saito Nonnegative Matrix Factorisation

Autorzy

Liu Houguang , Wang Wenbo , Xue Lin , Yang Jianhua , Wang Zhihua , Hua Chunli

Treść / Zawartość

Pełne teksty:

Liu_Speech Enhancement Based on Discrete_4_2020.pdf

Pobierz

Identyfikatory

DOI

10.24425/aoa.2020.134072

Warianty tytułu

Języki publikacji

Abstrakty

Nonnegative matrix factorization (NMF) is one of the most popular machine learning tools for speech enhancement (SE). However, there are two problems reducing the performance of the traditional NMF-based SE algorithms. One is related to the overlap-and-add operation used in the short time Fourier transform (STFT) based signal reconstruction, and the other is the Euclidean distance used commonly as an objective function; these methods can cause distortion in the SE process. In order to get over these shortcomings, we propose a novel SE joint framework which combines the discrete wavelet packet transform (DWPT) and the Itakura-Saito nonnegative matrix factorisation (ISNMF). In this approach, the speech signal was first split into a series of subband signals using the DWPT. Then, the ISNMF was used to enhance the speech for each subband signal. Finally, the inverse DWPT (IDWT) was utilised to reconstruct these enhanced speech subband signals. The experimental results show that the proposed joint framework effectively enhances the performance of speech enhancement and performs better in the unseen noise case compared to the traditional NMF methods.

Słowa kluczowe

speech enhancement discrete wavelet packet transform nonnegative matrix factorisation Itakura-Saito divergence

Wydawca

Instytut Podstawowych Problemów Techniki PAN
Komitet Akustyki PAN
Polskie Towarzystwo Akustyczne

Czasopismo

Archives of Acoustics

Rocznik

2020

Tom

Vol. 45, No. 4

Strony

565--572

Opis fizyczny

Bibliogr. 40 poz., rys., wykr.

Twórcy

autor

Liu Houguang

School of Mechatronic Engineering, China University of Mining and Technology, Xuzhou 221116, China

autor

Wang Wenbo

wangwenbo@cumt.edu.cn

School of Mechatronic Engineering, China University of Mining and Technology, Xuzhou 221116, China

autor

Xue Lin

School of Mechatronic Engineering, China University of Mining and Technology, Xuzhou 221116, China

autor

Yang Jianhua

School of Mechatronic Engineering, China University of Mining and Technology, Xuzhou 221116, China

autor

Wang Zhihua

School of Mechatronic Engineering, China University of Mining and Technology, Xuzhou 221116, China

autor

Hua Chunli

School of Mechatronic Engineering, China University of Mining and Technology, Xuzhou 221116, China

Bibliografia

1. Bavkar S., Sahare S. (2013), PCA based single channel speech enhancement method for highly noisy environment, Proceedings of International Conference on Advances in Computing, pp. 1103-1107, Mysore, doi: 10.1109/ICACCI.2013.6637331.
2. Boll S. (1979), Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Acoustics Speech & Signal Processing, 27 (2): 113-120, doi: 10.1109/TASSP.1979.1163209.
3. Bouzid A., Ellouze N. (2016), Speech enhancement based on wavelet packet of an improved principal component analysis, Computer Speech & Language, 35: 58-72, doi: 10.1016/j.csl.2015.06.001.
4. Chien J. T., Yang P. K. (2015), Bayesian factorization and learning for monaural source separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24 (1): 185-195, doi: 10.1109/TASLP.2015.2502141.
5. Coifman R. R., Wickerhauser M. V. (1992), Entropy-based algorithms for best basis selection, IEEE Transactions on Information Theory, 38 (2): 713-718, doi: 10.1109/18.119732.
6. Févotte C., Le Roux J., Hershey J. R. (2013), Non-negative dynamical system with application to speech and audio, Proceedings of 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3158-3162, Vancouver, doi: 10.1109/ICASSP.2013.6638240.
7. Gokhale M., Khanduja D. K. (2010), Time domain signal analysis using wavelet packet decomposition approach, International Journal of Communications, Network and System Sciences, 3 (3): 321-329, doi: 10.4236/ijcns.2010.33041.
8. Grancharov V., Samuelsson J., Kleijn B. (2006), On causal algorithms for speech enhancement, IEEE Transactions on Speech & Audio Processing, 14 (3): 764-773, doi: 10.1109/TSA.2005.857802.
9. Hansen J. H., Pellom B. L. (1998), An effective quality evaluation protocol for speech enhancement algorithms, Proceedings of Fifth International Conference on Spoken Language Processing, pp. 0917-0921, Sydney.
10. Islam M. S., Al Mahmud T. H., Khan W. U., Ye Z. (2019), Supervised single channel speech enhancement based on dual-tree complex wavelet transforms and nonnegative matrix factorization using the joint learning process and subband smooth ratio mask, Electronics, 8 (3): 353-371, doi: 10.3390/electronics8030353.
11. Krawczyk-Becker M., Gerkmann T. (2016), An evaluation of the perceptual quality of phase-aware single-channel speech enhancement, Journal of the Acoustical Society of America, 140 (4): EL364-EL369, doi: 10.1121/1.4965288.
12. Lai Y.-H., Chen F., Wang S.-S., Lu X., Tsao Y., Lee C.-H. (2016), A deep denoising autoencoder approach to improving the intelligibility of vocoded speech in cochlear implant simulation, IEEE Transactions on Biomedical Engineering, 64 (7): 1568-1578, doi: 10.1109/TBME.2016.2613960.
13. Lee D. D., Seung H. S. (1999), Learning the parts of objects by non-negative matrix factorization, Nature, 401 (6755): 788-791, doi: 10.1038/44565.
14. Lee S., Han D. K., Ko H. (2017), Single-channel speech enhancement method using reconstructive NMF with spectrotemporal speech presence probabilities, Applied Acoustics, 117: 257-262, doi: 10.1016/j.apacoust.2016.04.024.
15. Li J., Sakamoto S., Hongo S., Akagi M., Suzuki Y. I. (2011), Two-stage binaural speech enhancement with Wiener filter for high-quality speech communication, Speech Communication, 53 (5): 677-689, doi: 10.1016/j.specom.2010.04.009.
16. Li Y., Zhang X., Sun M. (2017), Robust Nonnegative matrix factorization with β-divergence for speech separation, ETRI Journal, 39 (1): 21-29, doi: 10.4218/etrij.17.0115.0122.
17. Luts H. et al. (2010), Multicenter evaluation of signal enhancement algorithms for hearing aids, Journal of the Acoustical Society of America, 127 (3): 1491-1505, doi: 10.1121/1.3299168.
18. Magron P., Virtane B. (2018), Expectation-maximization algorithms for Itakura-Saito nonnegative matrix factorization, Proceedings of 2018 Conference of the International Speech Communication Association (INTERSPEECH), pp. 856-860, Graz, doi: 10.21437/Interspeech.2018-1840.
19. Mavaddaty S., Ahadi S. M., Seyedin S. (2017), Speech enhancement using sparse dictionary learning in wavelet packet transform domain, Computer Speech & Language, 44: 22-47, doi: 10.1016/j.csl.2017.01.009.
20. Mohammadiha N., Smaragdis P., Leijon A. (2013), Supervised and unsupervised speech enhancement using nonnegative matrix factorization, IEEE Transactions on Audio, Speech, and Language Processing, 21 (10): 2140-2151, doi: 10.1109/TASL.2013.2270369.
21. Mowlaee P., Saeidi R. (2014), Time-frequency constraints for phase estimation in single-channel speech enhancement, Proceedings of 2014 14th International Workshop on Acoustic Signal Enhancement, pp. 337-341, Juan-les-Pins, doi: 10.1109/IWAENC.2014.6954314.
22. Nakano M., Kameoka H., Le Roux J., Kitano Y., Ono N., Sagayama S. (2010), Convergence-guaranteed multiplicative algorithms for nonnegative matrix factorization with β-divergence, Proceedings of 2010 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pp. 283-288, Kittila, doi: 10.1109/MLSP.2010.5589233.
23. Nie S., Shan L., Wenju L., Xueliang Z., Jianhua T. (2018), Deep learning based speech separation via NMF-style reconstructions, IEEE/ACM Transactions on Audio Speech & Language Processing, 26 (11): 2043-2055, doi: 10.1109/TASLP.2018.2851151.
24. Panfili L. M., Haywood J., McCloy D. R., Souza P. E., Wright R. A. (2017), The UW/NU Corpus, Version 2.0, https://depts.washington.edu/phonlab/projects/uw-nu.php.
25. Rix A. W., Beerends J. G., Hollier M. P., Hekstra A. P. (2001), Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs, Proceedings of 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (ICASSP), pp. 749-752, Salt Lake City, doi: 10.1109/ICASSP.2001.941023.
26. Saleem N., Khattak M. I. I., Ali M. Y., Shafi M. (2019), Deep neural network for supervised singlechannel speech enhancement, Archives of Acoustics, 44 (1): 3-12, doi: 10.24425/aoa.2019.126347.
27. Saleem N., Khattak M. I., Shafi M. (2018), Unsupervised speech enhancement in low SNR environments via sparseness and temporal gradient regularization, Applied Acoustics, 141: 333-347, doi: 10.1016/j.apacoust.2018.07.027.
28. Scalart P., Filho J. V. (1996), Speech enhancement based on a priori signal to noise estimation, Proceedings of 1996 IEEE International Conference on Acoustics, pp. 629-632, Atlanta, doi: 10.1109/ICASSP.1996.543199.
29. Sun D. L., Fevotte C. (2014), Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence, Proceedings of 2014 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (ICASSP), pp. 6201-6205, Florence, doi: 10.1109/ICASSP.2014.6854796.
30. Sun P., Qin J. (2016), Wavelet packet transform based speech enhancement via two-dimensional SPP estimator with generalized gamma priors, Archives of Acoustics, 41 (3): 579-590, doi: 10.1515/aoa-2016-0056.
31. Taal C. H., Hendriks R. C., Heusdens R., Jensen J. (2011), An algorithm for intelligibility prediction of time-frequency weighted noisy speech, IEEE Transactions on Audio, Speech, and Language Processing, 19 (7): 2125-2136, doi: 10.1109/TASL.2011.2114881.
32. Varga A., Steeneken H. J. (1993), Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems, Speech Communication, 12 (3): 247-251, doi: 10.1016/0167-6393(93)90095-3.
33. Varshney Y. V., Abbasi Z. A., Abidi M. R., Farooq O. (2017), Frequency selection based separation of speech signals with reduced computational time using sparse NMF, Archives of Acoustics, 42 (2): 287-295, doi: 10.1515/aoa-2017-0031.
34. Veisi H., Sameti H., Aroudi A. (2015), Hidden Markov model-based speech enhancement using multivariate Laplace and Gaussian distributions, IET Signal Processing, 9 (2): 177-185, doi: 10.1049/ietspr.2014.0032.
35. Wang D., Jiang M., Niu F., Cao Y., Zhou C. (2018a), Speech Enhancement Control Design Algorithm for Dual-Microphone Systems Using β-NMF in a Complex Environment, Complexity, 2018, Article ID 6153451, doi: 10.1155/2018/6153451.
36. Wang D., Chen J. (2018), Supervised speech separation based on deep learning: An overview, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26 (10): 1702-1726, doi: 10.1109/TASLP.2018.2842159.
37. Wang D., Hansen J. H. L. (2018), Speech enhancement for cochlear implant recipients, Journal of the Acoustical Society of America, 143 (4): 2244-2254, doi: 10.1121/1.5031112.
38. Wang M., Zhang E., Tang Z. (2018b), Speech enhancement based on NMF under electric vehicle noise condition, IEEE Access, 6: 9147-9159, doi: 10.1109/ACCESS.2018.2797165.
39. Wang S. S., Chern A., Tsao Y., Hung J. W., Lai Y. H., Su B. (2016), Wavelet speech enhancement based on nonnegative matrix factorization, IEEE Signal Processing Letters, 23 (8): 1101-1105, doi: 10.1109/LSP.2016.2571727.
40. Wang S. S. et al. (2015), Improving denoising autoencoder based speech enhancement with the speech parameter generation algorithm, Proceedings of 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), pp. 365-369, Hong Kong, doi: 10.1109/APSIPA.2015.7415295.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-c45bc411-8bc3-4b72-8fa4-52339090735d