Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
This paper describes research behind a Large-Vocabulary Continuous Speech Recognition (LVCSR) system for the transcription of Senate speeches for the Polish language. The system utilizes several components: a phonetic transcription system, language and acoustic model training systems, a Voice Activity Detector (VAD), a LVCSR decoder, and a subtitle generator and presentation system. Some of the modules relied on already available tools and some had to be made from the beginning but the authors ensured that they used the most advanced techniques they had available at the time. Finally, several experiments were performed to compare the performance of both more modern and more conventional technologies.
Wydawca
Czasopismo
Rocznik
Tom
Strony
501--509
Opis fizyczny
Bibliogr. 40 poz., tab.
Twórcy
autor
- Polish-Japanese Institute of Information Technology Koszykowa 86, 02-008 Warszawa, Poland
autor
- Polish-Japanese Institute of Information Technology Koszykowa 86, 02-008 Warszawa, Poland
autor
- Polish-Japanese Institute of Information Technology Koszykowa 86, 02-008 Warszawa, Poland
Bibliografia
- 1. Brocki, Ł. (2010a). Koneksjonistyczny model języka polskiego. In XII International PhD Workshop OWD 2010 .
- 2. Brocki, Ł. (2010b). Koneksjonistyczny Model Języka w Systemach Rozpoznawania Mowy . PhD thesis, Polish-Japanese Institute of Information Technology.
- 3. Brocki, Ł., Koržinek, D., and Marasek, K. (2006). Recognizing connected digit strings using neural networks. In Text, Speech and Dialogue , Springer.
- 4. Brocki L., Koržinek D., Marasek K. (2014), Improved factorization of a connectionist language model for single-pass real-time speech recognition, [in:] Foundations of Intelligent Systems, Andreasen T., Christiansen H., Cubero J.-C., Raś, Z., [Eds.], volume 8502 of Lecture Notes in Computer Science, pp. 355–364, Springer International Publishing.
- 5. Brocki, Ł., Koržinek, D., and Marasek, K. (2008). Telephony based voice portal for a university.
- 6. Brocki, Ł., Marasek, K., and Koržinek, D. (2012a). Connectionist language model for polish. In Intelligent Tools for Building a Scientic Information Platform , Springer.
- 7. Brocki, Ł., Marasek, K., and Koržinek, D. (2012b). Multiple model text normalization for the polish language. In Foundations of Intelligent Systems , Springer.
- 8. Demenko, G., Grocholewski, S., Klessa, K., Ogórkiewicz, J., Wagner, A., Lange, M., Sledzinski, D., and Cylwik, N. (2008). Jurisdic: Polish speech database for taking dictation of legal texts. In LREC .
- 9. Eide E., Gish H. (1996), A parametric approach to vocal tract length normalization, [in:] Acoustics, Speech, and Signal Processing, 1996, ICASSP-96, Conference Proceedings., 1996 IEEE International Conference on, volume 1, pp. 346–348, IEEE.
- 10. Federico, M., Bertoldi, N., and Cettolo, M. (2008). Irstlm: an open source toolkit for handling large scale language models. In Interspeech
- 11. Glass, J. R., Hsu, B.-J., et al. (2009). Language modeling for limited-data domains.
- 12. Graves, A., Eck, D., Beringer, N., and Schmidhuber, J. (2004). Biologically plausible speech recognition with lstm neural nets. In Biologically Inspired Approaches to Advanced Information Technology , Springer.
- 13. Graves A., Schmidhuber J. (2005), Framewise phoneme classification with bidirectional lstm and other neural network architectures, Neural Networks, 18, 5, 602–610.
- 14. Hickson, I. (2012). Webvtt. living standard. World Wide Web Consortium .
- 15. Hinton, G. E., Osindero, S., and Teh, Y.-W. (2006). A fast learning algorithm for deep belief nets. Neural computation , 18(7):1527 1554.
- 16. Huijbregts, M. A. H. (2008). Segmentation, diarization and speech transcription: surprise data unraveled.
- 17. Jelinek, F. (1997). Statistical methods for speech recognition . MIT press.
- 18. Katsamanis, A., Black, M., Georgiou, P. G., Goldstein, L., and Narayanan, S. (2011). Sailalign: Robust long speech-text alignment. In Proc. of Workshop on New Tools and Methods for Very-Large Scale Phonetics Research.
- 19. Kneser, R. and Ney, H. (1995). Improved backing-o for mgram language modeling. In Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on , volume 1, IEEE.
- 20. Koržinek, D. and Brocki, Ł. (2007). Grammar based automatic speech recognition system for the polish language. In Recent Advances in Mechatronics , Springer.
- 21. Kos, M., Vlaj, D., and Kacic, Z. (1996). Sloparl-slovenian parliamentary speech and text corpus for large vocabulary continuous speech recognition.
- 22. Lee, A., Kawahara, T., and Shikano, K. (2001). Juliusan open source real-time large vocabulary recognition engine.
- 23. Lööf, J., Bisani, M., Gollan, C., Heigold, G., Homeister, B., Plahl, C., Schlüter, R., and Ney, H. (2006). The 2006 rwth parliamentary speeches transcription system. In INTERSPEECH .
- 24. Marasek, K. (2012). Ted polish-to-english translation system for the iwslt 2012. Proceedings IWSLT 2012 .
- 25. Marasek, K., Brocki, Ł., Koržinek, D., Szklanny, K., and Gubrynowicz, R. (2009). User-centered design for a voice portal. In Aspects of Natural Language Processing , Springer.
- 26. Michalewicz, Z. (1996). Genetic algorithms+ data structures= evolution programs . springer.
- 27. Miłkowski, M. (2012). The Polish language in the digital age. Springer.
- 28. Mori, R. D. (1998). Spoken Dialogue With Computers (Signal Processing and its Applications) . Academic Press.
- 29. Povey, D., Burget, L., Agarwal, M., Akyazi, P., Feng, K., Ghoshal, A., Glembek, O., Goel, N. K., Karaát, M., Rastrow, A., et al. (2010). Subspace gaussian mixture models for speech recognition. In Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on , IEEE.
- 30. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al. (2011). The kaldi speech recognition toolkit. In IEEE 2011 workshop on automatic speech recognition and understanding .
- 31. Pražák, A., Psutka, J. V., Hoidekr, J., Kanis, J., Müller, L., and Psutka, J. (2006). Automatic online subtitling of the czech parliament meetings. In Text, Speech and Dialogue , Springer.
- 32. Przepiórkowski, A., Bańko, M., Górski, R., and Lewandowska-Tomaszczyk, B. (2012). Narodowy Korpus Języka Polskiego . Wydawnictwo Naukowe PWN, Warszawa.
- 33. Psutka, J. V. (2007). Benet of maximum likelihood linear transform (mllt) used at dierent levels of covariance matrices clustering in asr systems. In Text, Speech and Dialogue , Springer.
- 34. Rabiner, L. R. (1989). A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE , 77(2):257286.
- 35. Robinson, T., Hochberg, M., and Renals, S. (1996). The use of recurrent neural networks in continuous speech recognition. In Automatic speech and speaker recognition , Springer.
- 36. Romero-Fresco, P. (2011). Subtitling through speech recognition: Respeaking . St. Jerome Publishing.
- 37. Stolcke, A. et al. (2002). Srilm-an extensible language modeling toolkit. In INTERSPEECH .
- 38. Vesely K., Ghoshal A., Burget L., Povey D. (2013), Sequence-discriminative training of deep neural networks.
- 39. Wells, J. C. Polish sampa. http://www.phon.ucl.ac.uk/home/sampa/polish.htm .
- 40. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X., Moore, G., Odell, J., Ollason, D., Povey, D., et al. (2002). The htk book. Cambridge University Engineering Department , 3.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-57c3cab5-8f4d-458b-b84b-ce72dc54af14