PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Two-Microphone Dereverberation for Automatic Speech Recognition of Polish

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Reverberation is a common problem for many speech technologies, such as automatic speech recogni- tion (ASR) systems. This paper investigates the novel combination of precedence, binaural and statistical independence cues for enhancing reverberant speech, prior to ASR, under these adverse acoustical con- ditions when two microphone signals are available. Results of the enhancement are evaluated in terms of relevant signal measures and accuracy for both English and Polish ASR tasks. These show inconsistencies between the signal and recognition measures, although in recognition the proposed method consistently outperforms all other combinations and the spectral-subtraction baseline.
Rocznik
Strony
411--420
Opis fizyczny
Bibliogr. 32 poz., rys., tab.
Twórcy
  • School of Engineering and Computing Sciences, Durham University Durham, UK
  • Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science and Technology al. Mickiewicza 30, 30-059 Kraków, Poland
  • Centre for Vision, Speech and Signal Processing, University of Surrey Guildford, Surrey GU2 7XH, UK
autor
  • Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science and Technology al. Mickiewicza 30, 30-059 Kraków, Poland
Bibliografia
  • 1. Alinaghi A., Wang W., Jackson P.J.B. (2011), Integrating binaural cues and blind source separation method for separating reverberant speech mixtures, [in:] Proc. of ICASSP, Prague, pp. 209-212.
  • 2. Blauert J. (1997), Spatial Hearing: The Psychophysics of Human Sound Localization, 2nd Edition, MIT Press.
  • 3. Boll S.F. (1979), Suppression of acoustic noise in speech using spectral subtraction, Acoustics Speech and Signal Processing, IEEE Trans., 27, 2, 113-120.
  • 4. Chien J.T., Lai P.Y. (2005), Car speech enhancement using a microphone array, Int. Journal of Speech Technology, 8, 1, 79-91.
  • 5. Drgas S., Kocinski J., Sek A. (2008), Logatom articulation index evaluation of speech enhanced by blind source separation and single-channel noise reduction, Archives of Acoustics, 33, 4, 455-474.
  • 6. Fukumori T., Nakayama M., Nishiura T., Ya- MASHITA Y. (2013), Estimation of speech recognition performance in noisy and reverberant environments using pesq score and acoustic parameters, [in:] Signal and Information Processing Association Annual Summit and Conference (APSIPA), Asia-Pacific, pp. 1-4.
  • 7. Garofolo J.S., Lamel L.F., Fisher W.M., Fiscus J.G., Pallett D.S., Dahlgren N.L., Zue V. (1993), Timit acoustic-phonetic continuous speech corpus, Linguistic Data Consortium, Philadelphia.
  • 8. Gomez R., Kawahara T. (2010), Robust speech recognition based on dereverberation parameter optimization using acoustic model likelihood, Audio, Speech and Language Processing, IEEE Trans., 18, 7, 1708-1716.
  • 9. Grocholewski S. (1998), First database for spoken polish, [in:] Proc. of International Conference on Language Resources and Evaluation, Grenada, pp. 10591062.
  • 10. Hartmann W.M. (1999), How we localize sound, Physics Today, 52, 11, 24-29.
  • 11. Hinton G., Deng L., Yu D., Dahl G., Mohamed a., Jaitly N., Senior a., Vanhoucke V., Nguyen P., Sainath T., Kingsbury B. (2012), Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups, IEEE Signal Processing Magazine, 29, 6, 82.
  • 12. Hummersone C., Mason R., Brookes T. (2010), Dynamic precedence effect modeling for source separation in reverberant environments, Audio, Speech, and Language Processing, IEEE Trans., 18, 7, 1867-1871.
  • 13. Jeub M., Schafer M., Esch T., Vary P. (2010), Model-based dereverberation preserving binaural cues, Audio, Speech, and Language Processing, IEEE Trans., 18, 7, 1732-1745.
  • 14. Krishnamoorthy P., Prasanna S. (2009), Reverberant speech enhancement by temporal and spectral processing, Audio, Speech, and Language Processing, IEEE Trans., 17, 2, 253-266.
  • 15. Leonard R.G., Doddington G. (1993), Tidigits, Linguistic Data Consortium, Philadelphia.
  • 16. Li K., Guo Y., Fu Q., Yan Y. (2012), A two microphone-based approach for speech enhancement in adverse environments, [in:] Consumer Electronics (ICCE), 2012 IEEE International Conference, pp. 4142.
  • 17. Litovsky R.Y., Colburn H.S., Yost W.A., Guzman S.J. (1999), The precedence effect, J. Acoust. Soc. Am., 106, 1633-1654.
  • 18. Mandel M.I., Weiss R.J., Ellis D. (2010), Model- based expectation-maximization source separation and localization, Audio, Speech, and Language Processing, IEEE Trans., 18, 2, 382-394.
  • 19. Nakatani T., Kinoshita K., Miyoshi M. (2007), Harmonicity-based blind dereverberation for single channel speech signals, Audio, Speech, and Language Processing, IEEE Trans., 15, 1, 80-95.
  • 20. Naylor P.A., Gaubitch N.D. (2005), Speech dereverberation, [in:] Proc. of Int. Workshop Acoust. Echo Noise Control, Eindhoven.
  • 21. Palomaki K.J., Brown G.J., Wang D. (2004), A binaural processor for missing data speech recognition in the presence of noise and small-room reverberation, Speech Communication, 43, 4, 361-378.
  • 22. Pearce D., Hirsch H. (2000), The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions, [in:] ISCA ITRW ASR., pp. 29-32.
  • 23. Pearson J., Lin Q., Che C., Yuk D.S., Jin L., de Vries B., Flanagan j. (1996), Robust distant- talking speech recognition, [in:] Proc. of ICASSP, Atlanta, 1, 21-24.
  • 24. Sawada H., Araki S., Makino S. (2007), A two- stage frequency-domain blind source separation method for underdetermined convolutive mixtures, [in:] Proc. of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 139-142.
  • 25. Seltzer M.L., Raj B., Stern R.M. (2004), Likelihood-mazimizing beamforming for robust handsfree speech recognition, Speech and Audio Processing, IEEE Trans., 12, 5, 489-498.
  • 26. Shi G., Aarabi p. (2003), Robust digit recognition using phase-dependent time-frequency masking, [in:] Proc. of ICASSP, Hong Kong, pp. 684-687.
  • 27. Vincent E., Gribonval R., Fevotte C. (2006), Performance measurement in blind audio source separation, Audio, Speech, and Language Processing, IEEE Trans., 14, 4, 1462-1469.
  • 28. Ward D.B., Kennedy R.A., Williamson R.C. (2001), Constant directivity beamforming, [in:] Microphone Arrays, Springer-Verlag.
  • 29. Wu M., Wang D. (2006), A two-stage algorithm for one-microphone reverberant speech enhancement, Audio, Speech, and Language Processing, IEEE Trans., 14, 774-784.
  • 30. Young S. J., Kershaw D., Odell J., Ollason D., Valtchev V., Woodland P. (2006), The HTK Book Version 3.4, Cambridge University Press.
  • 31. Ziółko B., Manandhar S., Wilson R.C., Ziółko M., Gałka J. (2008), Application of htk to the Polish language, [in:] Proc. of International Conference on Audio, Language and Image Processing, Shanghai.
  • 32. Ziółko M., Gałka J., Ziółko B., Jadczyk T., Skurzok D., Masior M. (2011), Automatic speech recognition system dedicated for Polish, [in:] Proc. of Interspeech, Florence.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-7c0b722e-2934-4176-b0e7-42de7df144b0
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.