Wavelet Packet Transform based Speech Enhancement via Two-Dimensional SPP Estimator with Generalized Gamma Priors

Sun, P.; Qin, J.

doi:10.1515/aoa-2016-0056

Artykuł - szczegóły

Tytuł artykułu

Wavelet Packet Transform based Speech Enhancement via Two-Dimensional SPP Estimator with Generalized Gamma Priors

Autorzy

Sun P. , Qin J.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.1515/aoa-2016-0056

Warianty tytułu

Języki publikacji

Abstrakty

Despite various speech enhancement techniques have been developed for different applications, existing methods are limited in noisy environments with high ambient noise levels. Speech presence probability (SPP) estimation is a speech enhancement technique to reduce speech distortions, especially in low signalto-noise ratios (SNRs) scenario. In this paper, we propose a new two-dimensional (2D) Teager-energyoperators (TEOs) improved SPP estimator for speech enhancement in time-frequency (T-F) domain. Wavelet packet transform (WPT) as a multiband decomposition technique is used to concentrate the energy distribution of speech components. A minimum mean-square error (MMSE) estimator is obtained based on the generalized gamma distribution speech model in WPT domain. In addition, the speech samples corrupted by environment and occupational noises (i.e., machine shop, factory and station) at different input SNRs are used to validate the proposed algorithm. Results suggest that the proposed method achieves a significant enhancement on perceptual quality, compared with four conventional speech enhancement algorithms (i.e., MMSE-84, MMSE-04, Wiener-96, and BTW).

Słowa kluczowe

speech enhancement speech presence probability wavelet packet transform two-dimensional Teager energy operator

Wydawca

Instytut Podstawowych Problemów Techniki PAN
Polska Akademia Nauk

Czasopismo

Archives of Acoustics

Rocznik

2016

Tom

Vol. 41, No. 3

Strony

579--590

Opis fizyczny

Bibliogr. 34 poz., rys., tab., wykr.

Twórcy

autor

Sun P.

Department of Electrical and Computer Engineering, Southern Illinois University Carbondale, 1230 Lincoln Drive, Mail Code 6603 Carbondale, IL 62901, USA

autor

Qin J.

jqin@siu.edu

Department of Electrical and Computer Engineering, Southern Illinois University Carbondale, 1230 Lincoln Drive, Mail Code 6603 Carbondale, IL 62901, USA

Bibliografia

1. AudioMiCro, Free Industrial and Machinery Sound Effects, Retrived November 29th, 2015, from http://www.audiomicro.com/free-sound-effects/freeindustrial-and-machinery/.
2. Bahoura M., Rouat J. (2006), Wavelet speech enhancement based on time-scale adaptation, Speech Communication, 48, 12, 1620–1637.
3. Bahoura M., Rouat J. (2001), Wavelet speech enhancement based on the teager energy operator, Signal Processing Letters, IEEE, 8, 1, 10–12.
4. Boll S. F. (1979), Suppression of acoustic noise in speech using spectral subtraction, Acoustics, Speech and Signal Processing, IEEE Transactions on, 27, 2, 113–120.
5. Bovik A., Maragos C. P., Quatieri T. F. (1993), Am-fm energy detection and separation in noise using multiband energy operators, Signal Processing, IEEE Transactions on, 41, 12, 3245–3265.
6. Chang S. G., Yu B., Vetterli M. (2000), Adaptive wavelet thresholding for image denoising and compression, Image Processing, IEEE Transactions on, 9, 9, 1532–1546.
7. Cohen I., Berdugo B. (2001), Speech enhancement for non-stationary noise environments, Signal processing, 81, 11, 2403–2418.
8. Cohen I. (2003), Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging, Speech and Audio Processing, IEEE Transactions on, 11, 5, 466–475.
9. Cohen I. (2004), Speech enhancement using a noncausal a priori snr estimator, Signal Processing Letters, IEEE, 11, 9, 725–728.
10. Dunn R. B., Quatieri T. F., Kaiser J. F. (1993), Detection of transient signals using the energy operator, Acoustics, Speech, and Signal Processing, ICASSP., 1993 IEEE International Conference on, pp. 145–148.
11. Ephraim Y., Malah D. (1984), Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator, Acoustics, Speech and Signal Processing, IEEE Transactions on, 32, 6, 1109–1121.
12. Ephraim Y., Van Trees H. L. (1995), A signal subspace approach for speech enhancement, Acoustics, Speech and Signal Processing, IEEE Transactions on, 3, 4, 251–266.
13. Erkelens J. S., Hendriks R. C., Heusdens R., Jensen J. (2007), Minimum mean-square error estimation of discrete fourier coeficients with generalized gamma priors, Audio, Speech, and Language Processing, IEEE Transactions on, 15, 6, 1741–1752.
14. Fisher E., Tabrikian J., Dubnov S. (2006), Generalized likelihood ratio test for voiced-unvoiced decision in noisy speech using the harmonic model, Audio, Speech, and Language Processing, IEEE Transactions on, 14, 2, 502–510.
15. Gerkmann T., Breithaupt C., Martin R. (2008), Improved a posteriori speech presence probability estimation based on a likelihood ratio with fixed priors, Audio, Speech, and Language Processing, IEEE Transactions on, 16, 5, 910–919.
16. Ghanbari Y., Karami-Mollaei M. R. (2006), A new approach for speech enhancement based on the adaptive thresholding of the wavelet packets, Speech communication, 48, 8, 927–940.
17. Hendriks R. C., Gerkmann T., Jensen J. (2013), Dft-domain based single-microphone noise reduction for speech enhancement: a survey of the state of the art, Synthesis Lectures on Speech and Audio Processing, 9, 1, 80–84.
18. Hu Y., Loizou P. C. (2004), Speech enhancement based on wavelet thresholding the multitaper spectrum, Speech and Audio Processing, IEEE Transactions on, 12, 1, 59–67.
19. Hu Y., Loizou P. C. (2007), Subjective comparison and evaluation of speech enhancement algorithms, Speech communication, 49, 7, 588–601.
20. Johnson M. T., Yuan X., Ren Y. (2007), Speech signal enhancement through adaptive wavelet thresholding, Speech Communication, 49, 2, 123–133.
21. Kaiser J. F. (1993), Some useful properties of teager’s energy operators, Acoustics, Speech, and Signal Processing, ICASSP-93, IEEE International Conference on, pp. 149–152.
22. Kandia V., Stylianou Y. (2006), Detection of sperm whale clicks based on the teager-kaiser energy operator, Applied Acoustics, 67, 11, 1144–1163.
23. Langner B., Black A. W. (2004), Creating a database of speech in noise for unit selection synthesis, Fifth ISCA Workshop on Speech Synthesis, 229–230.
24. Loizou P. C. (20130, Speech enhancement: theory and practice, CRC press.
25. Martin R. (2002), Speech enhancement using mmse short time spectral estimation with gamma distributed speech priors, Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference, pp. 253–256.
26. Martin R. (2005), Speech enhancement based on minimum mean-square error estimation and supergaussian priors, Speech and Audio Processing, IEEE Transactions on, 13, 5, 845–856.
27. Mohammadiha N., Martin R., Leijon A. (2013), Spectral domain speech enhancement using hmm statedependent super-gaussian priors, Signal Processing Letters, IEEE, 20, 3, 253–256.
28. Park J., Kim J.-W., Chang J.-H., Jin Y. G., Kim N. S. (2015), Estimation of speech absence uncertainty based on multiple linear regression analysis for speech enhancement, Applied Acoustics, 87, 2015, 205–211.
29. Sanam T. F., Shahnaz C. (2013), Noisy speech enhancement based on an adaptive threshold and a modified hard thresholding function in wavelet packet domain, Digital Signal Processing, 23, 3, 941–951.
30. Scalart P. (1996), Speech enhancement based on a priori signal to noise estimation, Acoustics, Speech, and Signal Processing, ICASSP Conference Proceedings, IEEE International Conference on, pp. 629–632.
31. Simoncelli E. P., Adelson E. H. (1996), Noise removal via bayesian wavelet coring, Image Processing Proceedings., International Conference on, pp. 379–382.
32. Tasmaz H., Ercelebi E. (2008), Speech enhancement based on undecimated wavelet packet-perceptual flterbanks and mmse-stsa estimation in various noise environments, Digital Signal Processing, 18, 5, 797–812.
33. Weickert T., Benjaminsen C., Kiencke U. (2008), Analytic complex wavelet packets for speech enhancement, Acoustics, Speech and Signal Processing, ICASSP 2008. IEEE International Conference, pp. 3269–3272.
34. Ying G., Mitchell C., Jamieson L. (1993), Endpoint detection of isolated utterances based on a modified teager energy measurement, Acoustics, Speech, and Signal Processing, ICASSP-93, IEEE International Conference on, pp. 732–735.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-4610a6f0-3a15-45f7-ab93-f8392d0d4fb5