Comparison of optimization algorithms of connectionist temporal classifier for speech recognition system

Amirgaliyev, Yedilkhan; Darkhan, Kuanyshbay; Shoiynbek, Aisultan

doi:10.35784/IAPGOS.234

Artykuł - szczegóły

Tytuł artykułu

Comparison of optimization algorithms of connectionist temporal classifier for speech recognition system

Autorzy

Amirgaliyev Yedilkhan , Darkhan Kuanyshbay , Shoiynbek Aisultan

Treść / Zawartość

Pełne teksty:

amirgaliyev_darkhan_shoiynbek_comparison_IAPGOS_nr_3'19.pdf

Pobierz

Identyfikatory

DOI

10.35784/IAPGOS.234

Warianty tytułu

Porównanie algorytmów optymalizacji klasyfikatora czasowego do systemu rozpoznawania mowy

Języki publikacji

Abstrakty

This paper evaluates and compares the performances of three well-known optimization algorithms (Adagrad, Adam, Momentum) for faster training the neural network of CTC algorithm for speech recognition. For CTC algorithms recurrent neural network has been used, specifically Long- Short-Term memory. LSTM is effective and often used model. Data has been downloaded from VCTK corpus of Edinburgh University. The results of optimization algorithms have been evaluated by the Label error rate and CTC loss.

W artykule dokonano oceny i porównania wydajności trzech znanych algorytmów optymalizacyjnych (Adagrad, Adam, Momentum) w celu przyspieszenia treningu sieci neuronowej algorytmu CTC do rozpoznawania mowy. Dla algorytmów CTC wykorzystano rekurencyjną sieć neuronową, w szczególności LSTM, która jest efektywnym i często używanym modelem. Dane zostały pobrane z wydziału VCTK Uniwersytetu w Edynburgu. Wyniki algorytmów optymalizacyjnych zostały ocenione na podstawie wskaźników Label error i CTC loss.

Słowa kluczowe

recurrent neural network search method acoustic systems modeling language

rekurencyjna sieć neuronowa metoda wyszukiwania akustyka język modelowania systemów

Wydawca

Wydawnictwo Politechniki Lubelskiej

Czasopismo

Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska

Rocznik

2019

Tom

T. 9, nr 3

Strony

54--57

Opis fizyczny

Bibliogr. 16 poz., rys., tab.

Twórcy

autor

Amirgaliyev Yedilkhan

amir_ed@mail.ru

Institute Information and Computational Technologies CS MES RK, Almaty, Kazakhstan
Suleyman Demirel University, Almaty, Kazakhstan

autor

Darkhan Kuanyshbay

darkhan.kuanyshbay@sdu.edu.kz

Institute Information and Computational Technologies CS MES RK, Almaty, Kazakhstan
Suleyman Demirel University, Almaty, Kazakhstan

autor

Shoiynbek Aisultan

aisultan.shoiynbek@sdu.edu.kz

Suleyman Demirel University, Almaty, Kazakhstan

Bibliografia

[1] Amirgaliev Y., Hahn M., Mussabayev T.: The speech signal segmentation algorithm using pitch synchronous analysis. Journal Open Computer Science 7(1)/2017, 1–8.
[2] Andrychowicz M., Denil M., Colmenarejo S.G., Hoffman M.W., Pfau D., Schaul T., Shillingford B., de Freitas N.: Learning to learn by gradient descent by gradient descent. 30th Conference on Neural Information Processing Systems NIPS 2016.
[3] Bahdanau D., Cho K., Bengio Y.: Neural machine translation by jointly learning to align and translate. Proc. ICLR, 2015.
[4] Bengio Y., Ducharme R., Vincent P., Jauvin C.: A Neural Probabilistic Language Model. Journal of Machine Learning Research 3/2003, 1137–1155.
[5] Bottou L.: Large-Scale Machine Learning with Stochastic Gradient Descent. NEC Labs America, Princeton.
[6] Duchi J., Hazan E., Singer Y.: Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Journal of Machine Learning Research 12/2011, 2121–2159.
[7] Gales M., Young S.: The Application of Hidden Markov Models in Speech Recognition. Foundations and Trends in Signal Processing 1(3)/2007, 195–304.
[8] Graves A., Fernandez S., Gomez F., Schmidhuber J.: Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, 2006.
[9] Graves A., Jaitly N.: Towards End-to-End Speech Recognition with Recurrent Neural Networks. Proceedings of the 31st International Conference on Machine Learning 2014.
[10] Kingma D.P., Ba J.: Adam: A Method For Stochastic Optimization. Proc. 3rd International Conference for Learning Representations. 2015 arXiv:1412.6980v9.
[11] Loizou N., Richtarik P.: Momentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methods. 2017, arXiv:1712.09677v2
[12] Mussabayev R.R., Amirgaliyev N., Tairova A.T., Mussabayev T.R., Koibagarov K.C.: The technology for the automatic formation of the personal digital voice pattern. Application of Information and Communication Technologies AICT 2016.
[13] Schuster M., Paliwal K.K.: Bidirectional recurrent neural networks. Signal Processing. IEEE Transactions 45(11)/1997, 2673–2681.
[14] Sutskever I., Vinyals O., Le Q.V.: Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems 2014, 3104–3112.
[15] Wiseman S., Rush A.M.: Sequence-to-Sequence Learning as Beam-Search Optimization. Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing 2016.
[16] Yu D., Li J.: Recent Progresses in Deep Learning based Acoustic Models. Tencent AI Lab, Microsoft AI and Research, 2018.

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-f73d64d4-57b6-4535-b964-89fabf8ea251