Phase Autocorrelation Bark Wavelet Transform (PACWT) Features for Robust Speech Recognition

Majeed, S. A.; Husain, H.; Samad, S. A.

doi:10.1515/aoa-2015-0004

Artykuł - szczegóły

Tytuł artykułu

Phase Autocorrelation Bark Wavelet Transform (PACWT) Features for Robust Speech Recognition

Autorzy

Majeed S. A. , Husain H. , Samad S. A.

Treść / Zawartość

Pełne teksty:

Majeed_Phase Autocorrelation Bark Wavele_1_2015.pdf

Pobierz

Identyfikatory

DOI

10.1515/aoa-2015-0004

Warianty tytułu

Języki publikacji

Abstrakty

In this paper, a new feature-extraction method is proposed to achieve robustness of speech recognition systems. This method combines the benefits of phase autocorrelation (PAC) with bark wavelet transform. PAC uses the angle to measure correlation instead of the traditional autocorrelation measure, whereas the bark wavelet transform is a special type of wavelet transform that is particularly designed for speech signals. The extracted features from this combined method are called phase autocorrelation bark wavelet transform (PACWT) features. The speech recognition performance of the PACWT features is evaluated and compared to the conventional feature extraction method mel frequency cepstrum coefficients (MFCC) using TI-Digits database under different types of noise and noise levels. This database has been divided into male and female data. The result shows that the word recognition rate using the PACWT features for noisy male data (white noise at 0 dB SNR) is 60%, whereas it is 41.35% for the MFCC features under identical conditions.

Słowa kluczowe

speech recognition feature extraction phase autocorrelation wavelet transform

Wydawca

Instytut Podstawowych Problemów Techniki PAN
Komitet Akustyki PAN
Polskie Towarzystwo Akustyczne

Czasopismo

Archives of Acoustics

Rocznik

2015

Tom

Vol. 40, No. 1

Strony

25--31

Opis fizyczny

Bibliogr. 27 poz., tab., wykr.

Twórcy

autor

Majeed S. A.

Sayf_alali@yahoo.com

Department of Electrical, Electronic and Systems Engineering, Faculty of Engineering and Built Environment, National University of Malaysia, UKM, Bangi, Selangor Malaysia

autor

Husain H.

Department of Electrical, Electronic and Systems Engineering, Faculty of Engineering and Built Environment, National University of Malaysia, UKM, Bangi, Selangor Malaysia

autor

Samad S. A.

Department of Electrical, Electronic and Systems Engineering, Faculty of Engineering and Built Environment, National University of Malaysia, UKM, Bangi, Selangor Malaysia

Bibliografia

1. Addison P.S. (2010), The illustrated wavelet transform handbook: introductory theory and applications in science, engineering, medicine and finance, CRC Press.
2. Boll S. (1979), Suppression of acoustic noise in speech using spectral subtraction, IEEE Transactions on Acoustics, Speech and Signal Processing, 27, 2, 113–120.
3. Chang C.C., Lin C.J. (2011), LIBSVM: a library for support vector machines, ACM Transactions on Intelligent Systems and Technology (TIST), 2, 3, 27.
4. Davis S., Mermelstein P. (1980), Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences, IEEE Transactions on Acoustics, Speech and Signal Processing, 28, 4, 357–366.
5. Gabor D. (1947), Acoustical quanta and the theory of hearing, Nature 159, 4044, 591–594.
6. Ikbal S., Misra H., Hermansky H., Magimai-Doss M. (2012), Phase AutoCorrelation (PAC) features for noise robust speech recognition, Speech Communication, 54, 7, 867–880.
7. Jie Y., Zhenli W. (2009), On the application of variable-step adaptive noise cancelling for improving the robustness of speech recognition, Computing, Communication, Control, and Management, CCCM 2009, ISECS International Colloquium on, IEEE.
8. Jolliffe I. (2005), Principal component analysis, Wiley Online Library.
9. Leonard R. (1984), A database for speaker-independent digit recognition, IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP’84, IEEE.
10. Liu F.H., Stern R.M., Huang X., Acero A. (1993), Efficient cepstral normalization for robust speech recognition, Proceedings of the workshop on Human Language Technology, Association for Computational Linguistics.
11. Majeed S., Husain H., Samad S., Hussain A. (2012), Hierarchical K-Means Algorithm Applied On Isolated Malay Digit Speech Recognition, International Proceedings of Computer Science & Information Technology, 34, 33–37.
12. Mansour D., Juang B.H. (1989), A family of distortion measures based upon projection operation for robust speech recognition, IEEE Transactions on Acoustics, Speech and Signal Processing, 37, 11, 1659–1671.
13. Nasersharif B., Akbari A. (2007), SNR-dependent compression of enhanced mel sub-band energies for compensation of noise effects on MFCC features, Pattern recognition letters, 28, 11, 1320–1326.
14. Nehe N.S., Holambe R.S. (2009), Isolated Word Recognition Using Normalized Teager Energy Cepstral Features, International Conference on Advances in Computing, Control, & Telecommunication Technologies, ACT ’09.
15. Paliwal K., Basu A. (1987), A speech enhancement method based on Kalman filtering, IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE ICASSP’87.
16. Rabiner L., Juang B.H. (1993), Fundamentals of speech recognition, PTR Prentice-Hall, Inc, Englewood Cliffs, New Jersey, USA.
17. Reid C.E., Passin T.B. (1992), Signal processing in C, John Wiley & Sons, Inc.
18. Rioul O., Duhamel P. (1992), Fast algorithms for discrete and continuous wavelet transforms, IEEE Transactions on Information Theory, 38, 2, 569–586.
19. Sambur M. (1978), Adaptive noise canceling for speech signals, IEEE Transactions on Acoustics, Speech and Signal Processing, 26, 5, 419–423.
20. Shannon B.J., Paliwal K.K. (2006), Feature extraction from higher-lag autocorrelation coefficients for robust speech recognition, Speech Communication, 48, 11, 1458–1485.
21. Traunm¨uller H. (1990), Analytical expressions for the tonotopic sensory scale, The Journal of the Acoustical Society of America, 88, 1, 97–100.
22. Tufekci Z., Gowdy J. (2000), Feature extraction using discrete wavelet transform for speech recognition, Proceedings of the IEEE, Southeastcon 2000.
23. Vaseghi S.V. (2008), Advanced digital signal processing and noise reduction, Wiley.
24. Yapanel U., Hansen J.H., Sarikaya R., Pellom B. (2001), Robust digit recognition in noise: an evaluation using the AURORA Corpus, Proc. Eurospeech.
25. Zhang X., Jiao Z., Zhao Z. (2005), The speech recognition based on the bark wavelet front-end processing, Fuzzy Systems and Knowledge Discovery, Springer, 302–305.
26. Zhang X., Bai J., LiangW. (2006), The speech recognition system based on bark wavelet MFCC, 8th International Conference on Signal Processing IEEE.
27. Zhu D., Paliwal K.K. (2004), Product of power spectrum and group delay function for speech recognition, Proceedingson IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP’04).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-546622c8-efc5-440c-b4a3-55b9fe7c268b