Speech Enhancement by Short-Time Spectrum Estimation with Multivariate Laplace Speech Model

Zhou, B.; Zhang, X.; Zou, X.; Zhao, G.

Artykuł - szczegóły

Tytuł artykułu

Speech Enhancement by Short-Time Spectrum Estimation with Multivariate Laplace Speech Model

Autorzy

Zhou B. , Zhang X. , Zou X. , Zhao G.

Wybrane pełne teksty z tego czasopisma

http://pe.org.pl/

Identyfikatory

Warianty tytułu

Wieloczynnikowy model mowy Laplace'a w estymatorze spektrum krótkookresowego, na potrzeby polepszenia dźwięku

Języki publikacji

Abstrakty

The paper presents a new short-time spectrum estimation algorithm for speech enhancement. A novel multivariate Laplace speech model is utilized to characterize the dependencies between adjacent DFT coefficients of speech, based on which a minimum mean-square error (MMSE) estimator of speech spectral components is derived. Moreover, the speech presence uncertainty is incorporated to modify the MMSE estimator. Experimental results show that the developed algorithm achieves better noise suppression and lower speech distortion compared to the existing speech enhancement methods.

W artykule przedstawiono nowy algorytm estymacji krótkookresowego spektrum głosu do poprawy dźwięku mowy. Wykorzystano wieloczynnikowy model Laplace'a w celu scharakteryzowania zależności pomiędzy składnikami DFT dźwięku mowy. Na tej podstawie obliczane jest minimum błędu średnio-kwadratowego dla estymatora. Wyniki eksperymentalne potwierdzają ulepszoną skuteczność eliminacji zakłóceń mowy, w porównaniu ze stosowanymi metodami.

Słowa kluczowe

speech enhancement minimum mean-square error (MMSE) multivariate Laplace distribution

polepszenie dźwięku mowy minimum błędu średniokwadratowego wieloczynnikowy rozkład Laplace'a

Wydawca

Wydawnictwo SIGMA-NOT

Czasopismo

Przegląd Elektrotechniczny

Rocznik

2012

Tom

R. 88, nr 12a

Strony

338--342

Opis fizyczny

Bibliogr. 14 poz., rys., tab.

Twórcy

autor

Zhou B.

autor

Zhang X.

autor

Zou X.

autor

Zhao G.

Institute of Command Automation, Haifu Xiang 1, Baixia District, Nanjing, China, 210007, binzhou86@yahoo.com.cn

Bibliografia

[1] Y. Ephraim and D. Malah, “Speech enhancement using a minimum mean-square error short-time spectral amplitude estimator,” IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 6, pp.1109-1121, Dec. 1984.
[2] S. Gazor and W. Zhang, “Speech probability distribution,” IEEE Signal Process. Lett., vol. 10, no.7, pp. 204-207, Jul. 2003.
[3] R. Martin, “Speech enhancement based on minimum meansquare error estimation and supergaussian priors,” IEEE Trans. Speech Audio Process., vol. 13, no. 5, pp. 845-856, Sep. 2005.
[4] R. C. Hendriks, R. Heusdens and J. Jensen, “Log-spectral magnitude MMSE estimators under super-Gaussian densities,” in Proc. INTERSPEECH, 2009, pp. 1319-1322.
[5] K. Paliwal, B. Schwerin, and K. Wojcicki, “Single channel speech enhancement using MMSE estimation of short-time modulation magnitude spectrum,” in Proc. of INTERSPEECH, Florence, Italy, 2011, pp. 1209-1212.
[6] J. S. Erkelens, R. C. Hendriks, R. Heusdens, and J. Jensen, “Minimum mean-square error estimation of discrete Fourier coefficients with generalized Gamma priors,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 6, pp. 1741-1752, Aug. 2007.
[7] T. Esch and P. Vary, “Model-based speech enhancement using SNR dependent MMSE estimation,” in Proc. IEEE Int. Conf. Acoust.,Speech, Signal Process., Prague, Czech, May 2011, pp. 4652-4655.
[8] B. J. Borgstrom and A. Alwan, “Log-spectral amplitude estimation with generalized Gamma distributions for speech enhancement,” in Proc. IEEE Int. Conf. Acoust.,Speech, Signal Process., Prague, Czech, May 2011, pp. 4756-4759.
[9] C. Li and S. V. Andersen, “A block-based linear MMSE noise reduction with a high temporal resolution modeling of the speech excitation,” EURASIP J. Appl. Signal Process., vol. 18, pp. 2965-2978, 2005.
[10] E. Plourde and B. Champagne, “Multi-dimensional Bayesian STSA estimators for the enhancement of speech with correlated frequency components,” IEEE Trans. Signal Process., vol. 59, no. 7, pp. 3013-3024, Jul. 2011.
[11] I. W. Selesnick, “The estimation of Laplace random vectors in additive white Gaussian noise,” IEEE Trans. Signal Process., vol. 56, no. 8, pp. 3482-3496, Aug. 2008.
[12] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli, “Image denoising using scale mixtures of Gaussians in the wavelet domain,” IEEE Trans. Image Process., vol. 12, no. 11, pp. 1338-1351, Nov. 2003.
[13] R. Martin, “Noise power spectral density estimation based on optimal smoothing and minimum statistics,” IEEE Trans. Speech Audio Process., vol. 9, no. 5, pp. 504-512, Jul. 2001.
[14] Perceptual Evaluation of Speech Quality (PESQ) and Objective Method for End-to-End Speech Quality Assessment of Narrowband Telephone Networks and Speech Codecs, ITU-T Rec. P. 862, 2001.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BPS1-0050-0097