Model-Based Feature Compensation for Robust Speech Recognition

Shen, H.; Li, Q.; Guo, J.; Liu, G.

Artykuł - szczegóły

Tytuł artykułu

Model-Based Feature Compensation for Robust Speech Recognition

Autorzy

Shen H. , Li Q. , Guo J. , Liu G.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

This paper proposes a novel robust speech recognition approach based on the model-based feature compensation. The approach combines the GMM-based feature compensation and the HMM-based feature compensation together and employs the multiple recognition passes to achieve the best performance. In the initial recognition procedure, the GMM-based feature compensation approach is employed to give better clean model and noise model. Then we further refine these models by employing the HMM-based feature compensation approach. The statistical model of the clean speech and the noise is combined by using vector Taylor series (VTS) approximation. The experimental results show that the novel approach makes a significant improvement compared to the GMM-based feature compensation and the HMM-based feature compensation without any compensation in the initial pass.

Słowa kluczowe

robust speech recognition feature compensation EM algorithm

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2006

Tom

Vol. 72, nr 4

Strony

529--539

Opis fizyczny

bibliogr. 20 poz.

Twórcy

autor

Shen H.

autor

Li Q.

autor

Guo J.

autor

Liu G.

Xue Building 8, Room 1213, Beijing Univ.of Posts and Telecomunications, 100876, China, shen_hai_feng@126.com

Bibliografia

[1] Moreno, P.J., Raj, B., Stern, R.M.: A Vector Taylor Series Approach for Environment-Independent Speech Recognition, Proc. ICASSP, 1995, 733-736.
[2] Moreno, P.J.: Speech Recognition in Noisy Environments, Ph. D. thesis, ECE Department, CMU, April 1996.
[3] Raj, B., Gouvea, E.B., Moreno, P.J., Stern, R.M.: Cepstral Compensation by Polynomial Approximation for Environment-Independent Speech Recognition, Proc. ICSLP, Philadelphia, 1996, 2340-2343.
[4] Kim, N.S.: Statistical Linear Approximation for Environment Compensation, IEEE Signal Processing Letters, Vol.5, No.1, 1998, 8-10.
[5] Shen, H.F., Liu, G., Guo,J., and Li, Q.X.: Two-Domain Feature Compensation for Robust Speech Recognition, Proc. Second International Symposium on Neural Network(Wang, J., Liao, X., Yi, Z. Ed.),LNCS 3497, Springer-Verlag, Berlin, 2005,351-356.
[6] Shen,H.F., Liu, G., Guo, J.,Huang, P.M., Li, Q.X.: Environment Compensation Based on Maximum a Posteriori Estimation for Improved speech Recognition, Proc. Fourth Mexican International Conference on Artificial Intelligence(Gelbukh, A., de Albornoz, A., Terashima, H. Ed.), LNAI 3789, Springer-Verlag, Berlin, 2005, 854-862.
[7] Gales, M.J.F.: Model-Based Techniques for Noise Robust Speech Recognition, Ph.D. thesis, University of Cambridge, September 1995.
[8] Acero, A., Li, D., Kristjansson,K., Zhang, J.: HMM Adaptation Using Vector Taylor Series for Noisy Speech Recognition, Proc. ICSLP 2000,Beijing, 2000.
[9] Shen, H.F., Li, Q.X., Guo, J., Liu, G.: HMM Parameter Adaptation Using the Truncated First-Order VTS and EM Algorithm for Robust Speech Recognition, Proc. 2005 International Conference on Computational Intelligence and Security, Lecture Notes in Artificial Intelligence(Hao, Y. Ed.), LNAI 3801, Springer-Verlag, Berlin, 2005, 979-984.
[10] Sarikaya, R., Hansen, J.H.: PCA-PMC: A Novel Use of a Priori Knowledge for parallel model combination, Proc. ICASSP 2000, 2000, 1113-1116.
[11] Tai-Hwei, H., Hsiao-Chuan, W.: A fast algorithm for parallel model combination for noisy speech recognition, Computer Speech and Language, No.14, 2000,81-100.
[12] Sagayama, S., Yamaguchi, Y., Takahashi, S., Takahashi, J.: Jacobian Approach to Fast Acoustic Model Adaptation, Proc. ICASSP'97,Munich, Germany, 1997, 835-838.
[13] Sagayama, S., Kato, Y., Nakai,M., Shimodaira, H.: Jacobian Approach to Joint Adaptation to Noise, Channel and Vocal Tract Length, Proc. ISCA Workshop on Adaptation Methods, Sophia Antipolis, France, 2001, 117-120.
[14] Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EMalgorithm, Journal of the Royal Statistical Society B, 1977, 1-38.
[15] Gauvain, J.L., Lee, C.H.: Maximum A Posteriori Estimation for Multivariate Gaussian Mixture Observation of Markov Chains, IEEE Transactions on Speech and Audio Processing, Vol.2, No.2, 1994, 291-298.
[16] Huo, Q., Lee, C.H.: On-Line Adaptive Learning of the Continuous Density Hidden Markov Model Based on Approximate Recursive Bayes Estimate, IEEE Transactions on Speech and Audio Processing, vol.5, No.2, 1997, 161-172.
[17] Huo, Q., Chan, C., Lee, C.H.: Bayesian Adaptive Learning of the Parameters of Hidden Markov Model for Speech Recognition, IEEE Transactions on Speech and Audio Processing, Vol.3, No.5, 1995, 334-345.
[18] Zu, Y. Q.:Issues in the Scientific Design of the Continuous Speech Database, Available: http://www.cass.net.cn/chinese/s18 yys/yuyin/report/report 1998.htm.
[19] Rabiner, L.R.: A Totorial on Hidden Markov Models and Selected Applications in Speech Recognition, Proc.IEEE, vol.77, 1989, 257-286.
[20] Varga, A., Steenneken, H. J. M., Tomilson, M., Jones, D.: The NOISEX-92 Study on the Effect of Additive Noise on Automatic Speech Recognition,Documentation on the NOISEX-92 CD-ROMs, 1992.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BUS2-0010-0084