Two-Microphone Dereverberation for Automatic Speech Recognition of Polish

Kundegorski, M.; Jackson, P. J. B.; Ziółko, B.

doi:10.2478/aoa-2014-0045

Artykuł - szczegóły

Tytuł artykułu

Two-Microphone Dereverberation for Automatic Speech Recognition of Polish

Autorzy

Kundegorski M. , Jackson P. J. B. , Ziółko B.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.2478/aoa-2014-0045

Warianty tytułu

Języki publikacji

Abstrakty

Reverberation is a common problem for many speech technologies, such as automatic speech recogni- tion (ASR) systems. This paper investigates the novel combination of precedence, binaural and statistical independence cues for enhancing reverberant speech, prior to ASR, under these adverse acoustical con- ditions when two microphone signals are available. Results of the enhancement are evaluated in terms of relevant signal measures and accuracy for both English and Polish ASR tasks. These show inconsistencies between the signal and recognition measures, although in recognition the proposed method consistently outperforms all other combinations and the spectral-subtraction baseline.

Słowa kluczowe

speech enhancement reverberation automatic speech recognition ASR Polish

Wydawca

Instytut Podstawowych Problemów Techniki PAN
Komitet Akustyki PAN
Polskie Towarzystwo Akustyczne

Czasopismo

Archives of Acoustics

Rocznik

2014

Tom

Vol. 39, No. 3

Strony

411--420

Opis fizyczny

Bibliogr. 32 poz., rys., tab.

Twórcy

autor

Kundegorski M.

mikolaj.kundegorski@gmail.com

School of Engineering and Computing Sciences, Durham University Durham, UK
Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science and Technology al. Mickiewicza 30, 30-059 Kraków, Poland

autor

Jackson P. J. B.

p.jackson@surrey.ac.uk

Centre for Vision, Speech and Signal Processing, University of Surrey Guildford, Surrey GU2 7XH, UK

autor

Ziółko B.

bziolko@agh.edu.pl

Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science and Technology al. Mickiewicza 30, 30-059 Kraków, Poland

Bibliografia

1. Alinaghi A., Wang W., Jackson P.J.B. (2011), Integrating binaural cues and blind source separation method for separating reverberant speech mixtures, [in:] Proc. of ICASSP, Prague, pp. 209-212.
2. Blauert J. (1997), Spatial Hearing: The Psychophysics of Human Sound Localization, 2nd Edition, MIT Press.
3. Boll S.F. (1979), Suppression of acoustic noise in speech using spectral subtraction, Acoustics Speech and Signal Processing, IEEE Trans., 27, 2, 113-120.
4. Chien J.T., Lai P.Y. (2005), Car speech enhancement using a microphone array, Int. Journal of Speech Technology, 8, 1, 79-91.
5. Drgas S., Kocinski J., Sek A. (2008), Logatom articulation index evaluation of speech enhanced by blind source separation and single-channel noise reduction, Archives of Acoustics, 33, 4, 455-474.
6. Fukumori T., Nakayama M., Nishiura T., Ya- MASHITA Y. (2013), Estimation of speech recognition performance in noisy and reverberant environments using pesq score and acoustic parameters, [in:] Signal and Information Processing Association Annual Summit and Conference (APSIPA), Asia-Pacific, pp. 1-4.
7. Garofolo J.S., Lamel L.F., Fisher W.M., Fiscus J.G., Pallett D.S., Dahlgren N.L., Zue V. (1993), Timit acoustic-phonetic continuous speech corpus, Linguistic Data Consortium, Philadelphia.
8. Gomez R., Kawahara T. (2010), Robust speech recognition based on dereverberation parameter optimization using acoustic model likelihood, Audio, Speech and Language Processing, IEEE Trans., 18, 7, 1708-1716.
9. Grocholewski S. (1998), First database for spoken polish, [in:] Proc. of International Conference on Language Resources and Evaluation, Grenada, pp. 10591062.
10. Hartmann W.M. (1999), How we localize sound, Physics Today, 52, 11, 24-29.
11. Hinton G., Deng L., Yu D., Dahl G., Mohamed a., Jaitly N., Senior a., Vanhoucke V., Nguyen P., Sainath T., Kingsbury B. (2012), Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Processing Magazine, 29, 6, 82.
12. Hummersone C., Mason R., Brookes T. (2010), Dynamic precedence effect modeling for source separation in reverberant environments, Audio, Speech, and Language Processing, IEEE Trans., 18, 7, 1867-1871.
13. Jeub M., Schafer M., Esch T., Vary P. (2010), Model-based dereverberation preserving binaural cues, Audio, Speech, and Language Processing, IEEE Trans., 18, 7, 1732-1745.
14. Krishnamoorthy P., Prasanna S. (2009), Reverberant speech enhancement by temporal and spectral processing, Audio, Speech, and Language Processing, IEEE Trans., 17, 2, 253-266.
15. Leonard R.G., Doddington G. (1993), Tidigits, Linguistic Data Consortium, Philadelphia.
16. Li K., Guo Y., Fu Q., Yan Y. (2012), A two microphone-based approach for speech enhancement in adverse environments, [in:] Consumer Electronics (ICCE), 2012 IEEE International Conference, pp. 4142.
17. Litovsky R.Y., Colburn H.S., Yost W.A., Guzman S.J. (1999), The precedence effect, J. Acoust. Soc. Am., 106, 1633-1654.
18. Mandel M.I., Weiss R.J., Ellis D. (2010), Model- based expectation-maximization source separation and localization, Audio, Speech, and Language Processing, IEEE Trans., 18, 2, 382-394.
19. Nakatani T., Kinoshita K., Miyoshi M. (2007), Harmonicity-based blind dereverberation for single channel speech signals, Audio, Speech, and Language Processing, IEEE Trans., 15, 1, 80-95.
20. Naylor P.A., Gaubitch N.D. (2005), Speech dereverberation, [in:] Proc. of Int. Workshop Acoust. Echo Noise Control, Eindhoven.
21. Palomaki K.J., Brown G.J., Wang D. (2004), A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation, Speech Communication, 43, 4, 361-378.
22. Pearce D., Hirsch H. (2000), The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions, [in:] ISCA ITRW ASR., pp. 29-32.
23. Pearson J., Lin Q., Che C., Yuk D.S., Jin L., de Vries B., Flanagan j. (1996), Robust distant- talking speech recognition, [in:] Proc. of ICASSP, Atlanta, 1, 21-24.
24. Sawada H., Araki S., Makino S. (2007), A two- stage frequency-domain blind source separation method for underdetermined convolutive mixtures, [in:] Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 139-142.
25. Seltzer M.L., Raj B., Stern R.M. (2004), Likelihood-mazimizing beamforming for robust handsfree speech recognition, Speech and Audio Processing, IEEE Trans., 12, 5, 489-498.
26. Shi G., Aarabi p. (2003), Robust digit recognition using phase-dependent time-frequency masking, [in:] Proc. of ICASSP, Hong Kong, pp. 684-687.
27. Vincent E., Gribonval R., Fevotte C. (2006), Performance measurement in blind audio source separation, Audio, Speech, and Language Processing, IEEE Trans., 14, 4, 1462-1469.
28. Ward D.B., Kennedy R.A., Williamson R.C. (2001), Constant directivity beamforming, [in:] Microphone Arrays, Springer-Verlag.
29. Wu M., Wang D. (2006), A two-stage algorithm for one-microphone reverberant speech enhancement, Audio, Speech, and Language Processing, IEEE Trans., 14, 774-784.
30. Young S. J., Kershaw D., Odell J., Ollason D., Valtchev V., Woodland P. (2006), The HTK Book Version 3.4, Cambridge University Press.
31. Ziółko B., Manandhar S., Wilson R.C., Ziółko M., Gałka J. (2008), Application of htk to the Polish language, [in:] Proc. of International Conference on Audio, Language and Image Processing, Shanghai.
32. Ziółko M., Gałka J., Ziółko B., Jadczyk T., Skurzok D., Masior M. (2011), Automatic speech recognition system dedicated for Polish, [in:] Proc. of Interspeech, Florence.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-7c0b722e-2934-4176-b0e7-42de7df144b0