PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Deep Belief Neural Networks and Bidirectional Long-Short Term Memory Hybrid for Speech Recognition

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
This paper describes a Deep Belief Neural Network (DBNN) and Bidirectional Long-Short Term Memory (LSTM) hybrid used as an acoustic model for Speech Recognition. It was demonstrated by many independent researchers that DBNNs exhibit superior performance to other known machine learning frameworks in terms of speech recognition accuracy. Their superiority comes from the fact that these are deep learning networks. However, a trained DBNN is simply a feed-forward network with no internal memory, unlike Recurrent Neural Networks (RNNs) which are Turing complete and do posses internal memory, thus allowing them to make use of longer context. In this paper, an experiment is performed to make a hybrid of a DBNN with an advanced bidirectional RNN used to process its output. Results show that the use of the new DBNN-BLSTM hybrid as the acoustic model for the Large Vocabulary Continuous Speech Recognition (LVCSR) increases word recognition accuracy. However, the new model has many parameters and in some cases it may suffer performance issues in real-time applications.
Rocznik
Strony
191--195
Opis fizyczny
Bibliogr. 21 poz., rys., tab.
Twórcy
autor
  • Polish-Japanese Academy of Information Technology, Koszykowa 86, 02-008 Warszawa, Poland
autor
  • Polish-Japanese Academy of Information Technology, Koszykowa 86, 02-008 Warszawa, Poland
Bibliografia
  • 1. Ackley D., Hinton G., Sejnowski T. (1985), A Learning Algorithm for Boltzmann Machines, Cognitive Science, 9, 1, 147–169.
  • 2. Bishop C.M. (1995), Neural networks for pattern recognition, Oxford University Press, ISBN 0-19-853864-2.
  • 3. Brocki Ł., Korzinek D., Marasek K. (2006), Recognizing Connected Digit Strings Using Neural Networks, TSD 2006, Brno, Czech Republic.
  • 4. Dahl G.E., Yu D., Deng L., Acero A. (2011), Large vocabulary continuous speech recognition with context-dependent DBN-HMMS, ICASSP 2011, 4688–4691.
  • 5. Federico M., Bertoldi N., Cettolo M. (2008), IRSTLM: an Open Source Toolkit for Handling Large Scale Language Models, Proceedings of Interspeech, Brisbane, Australia.
  • 6. Graves A., Fernandez S., Schmidhuber J. (2005), Bidirectional LSTM networks for improved phoneme classification and recognition, ICANN 2005, Warsaw, Poland, pp. 799–804.
  • 7. Graves A., Rahman A., Hinton G.E. (2013), Speech Recognition with Deep Recurrent Neural Networks, ICASSP 2013, Vancouver, Canada.
  • 8. Graves A., Fern´andez S., Gomez F. Schmidhuber J. (2006), Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks, ICML 2006, Pittsburgh, USA, pp. 369–376.
  • 9. Graves A., Eck D., Beringer N., Schmidthuber J. (2004), Biologically Plausible Speech Recognition with LSTM Neural Nets, Bio-ADIT 2004, Lausanne, Switzerland, pp. 175–184.
  • 10. Hinton G.E., Osindero S., Teh Y. (2006), A fast learning algorithm for deep belief nets, Neural Computation, 18, pp 1527-1554
  • 11. Hochreiter S., Schmidhuber J. (1995), Long Short-Term Memory, Neural Computation, 9, 8, 1735–1780.
  • 12. Korzinek D., Marasek K., Brocki Ł. (2011), Automatic Transcription of Polish Radio and Television Broadcast Audio, Intelligent Tools for Building a Scientific Information Platform, Springer, 467, 489–497.
  • 13. Lee A., Kawahara T., Shikano K. (2001), Julius an open source real-time large vocabulary recognition engine, [in:] Proc. European Conference on Speech Communication and Technology (EUROSPEECH), pp. 1691–1694.
  • 14. Mohamed A., Dahl G.E., Hinton G.E. (2009), Deep belief networks for phone recognition, [in:] NIPS Workshop on Deep Learning for Speech Recognition and Related Applications.
  • 15. Rabiner L.R. (1989), A tutorial on hidden markov models and selected applications in speech recognition, Proc. IEEE, pp. 257–286.
  • 16. Schuster M., Paliwal K.K. (1997), Bidirectional recurrent neural networks, IEEE Transactions on Signal Processing, 45, 2673–2681, November 1997.
  • 17. Stolcke A. (2002), SRILM – An Extensible Language Modeling Toolkit, Speech Technology and Research Laboratory SRI International, Menlo Park, USA.
  • 18. Werbos P.J. (1987), Backpropagation through time: what it does and how to do it, Proc. IEEE, 78, 10, 1550–1560.
  • 19. Wollmer M., Eyben F., Graves A., Schuller B., Rigoll G. (2009), A Tandem BLSTM-DBN architecture for keyword spotting with enhanced context modeling, NOLISP 2009, Vic, Spain.
  • 20. Young S. (2000), The HTK Book, Cambridge University Press.
  • 21. www.scholarpedia.org/article/Deep belief networks
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-b8da770d-f34b-4cd1-90d7-2761ca3f4b99
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.