An optimized parallel implementation of non-iteratively trained recurrent neural networks

El Zini, Julia; Rizk, Yara; Awad, Mariette

doi:10.2478/jaiscr-2021-0003

Artykuł - szczegóły

Tytuł artykułu

An optimized parallel implementation of non-iteratively trained recurrent neural networks

Autorzy

El Zini Julia , Rizk Yara , Awad Mariette

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.2478/jaiscr-2021-0003

Warianty tytułu

Języki publikacji

Abstrakty

Recurrent neural networks (RNN) have been successfully applied to various sequential decision-making tasks, natural language processing applications, and time-series predictions. Such networks are usually trained through back-propagation through time (BPTT) which is prohibitively expensive, especially when the length of the time dependencies and the number of hidden neurons increase. To reduce the training time, extreme learning machines (ELMs) have been recently applied to RNN training, reaching a 99% speedup on some applications. Due to its non-iterative nature, ELM training, when parallelized, has the potential to reach higher speedups than BPTT. In this work, we present Opt-PR-ELM, an optimized parallel RNN training algorithm based on ELM that takes advantage of the GPU shared memory and of parallel QR factorization algorithms to efficiently reach optimal solutions. The theoretical analysis of the proposed algorithm is presented on six RNN architectures, including LSTM and GRU, and its performance is empirically tested on ten time-series prediction applications. Opt- PR-ELM is shown to reach up to 461 times speedup over its sequential counterpart and to require up to 20x less time to train than parallel BPTT. Such high speedups over new generation CPUs are extremely crucial in real-time applications and IoT environments.

Słowa kluczowe

GPU implementation parallelization recurrent neural network RNN Long-short Term Memory LSTM Gated Recurrent Unit GRU Extreme Learning Machines ELM non-iterative training

Wydawca

University of Social Sciences

Czasopismo

Journal of Artificial Intelligence and Soft Computing Research

Rocznik

2021

Tom

Vol. 11, No. 1

Strony

33--50

Opis fizyczny

Bibliogr. 44 poz., rys.

Twórcy

autor

El Zini Julia

jwe04@aub.edu.lb

Department of Electrical and Computer Engineering American University of Beirut

autor

Rizk Yara

yar01@aub.edu.lb

Department of Electrical and Computer Engineering American University of Beirut

autor

Awad Mariette

mariette.awad@aub.edu.lb

Department of Electrical and Computer Engineering American University of Beirut

Bibliografia

[1] Yoshua Bengio, Patrice Simard, Paolo Frasconi, et al. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2):157–166, 1994.
[2] Stephen A Billings. Nonlinear system identification: NARMAX methods in the time, frequency, and spatio-temporal domains. John Wiley & Sons, 2013.
[3] Armando Blanco, Miguel Delgado, and Maria C Pegalajar. A real-coded genetic algorithm for training recurrent neural networks. Neural networks, 14(1):93–105, 2001.
[4] Kyunghyun Cho, Bart Van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259, 2014.
[5] Kyunghyun Cho, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014.
[6] Jan K Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, and Yoshua Bengio. Attention-based models for speech recognition. In Advances in neural information processing systems, pages 577–585, 2015.
[7] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
[8] Jerome T Connor, R Douglas Martin, and Les E Atlas. Recurrent neural networks and robust time series prediction. IEEE transactions on neural networks, 5(2):240–254, 1994.
[9] Jeffrey L Elman. Finding structure in time. Cognitive science, 14(2):179–211, 1990.
[10] Ömer Faruk Ertugrul. Forecasting electricity load by a novel recurrent extreme learning machines approach. International Journal of Electrical Power & Energy Systems, 78:429–435, 2016.
[11] Martín Abadi et al. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. Software available from tensorflow.org.
[12] Alex Graves, Navdeep Jaitly, and Abdel-rahman Mohamed. Hybrid speech recognition with deep bidirectional lstm. In 2013 IEEE workshop on automatic speech recognition and understanding, pages 273–278. IEEE, 2013.
[13] Alex Graves, Abdel-rahman Mohamed, and Geoffrey Hinton. Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, pages 6645–6649. IEEE, 2013.
[14] Qing He, Tianfeng Shang, Fuzhen Zhuang, and Zhongzhi Shi. Parallel extreme learning machine for regression based on mapreduce. Neurocomputing, 102:52–58, 2013.
[15] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
[16] Guang-Bin Huang, Qin-Yu Zhu, Chee-Kheong Siew, et al. Extreme learning machine: a new learning scheme of feedforward neural networks. Neural networks, 2:985–990, 2004.
[17] Shan Huang, Botao Wang, Junhao Qiu, Jitao Yao, Guoren Wang, and Ge Yu. Parallel ensemble of online sequential extreme learning machine based on mapreduce. Neurocomputing, 174:352–367, 2016.
[18] Weikuan Jia, Dean Zhao, Yuanjie Zheng, and Sujuan Hou. A novel optimized ga–elman neural network algorithm. Neural Computing and Applications, 31(2):449–459, 2019.
[19] Michael I Jordan. Serial order: A parallel distributed processing approach. In Advances in psychology, volume 121, pages 471–495. Elsevier, 1997.
[20] Viacheslav Khomenko, Oleg Shyshkov, Olga Radyvonenko, and Kostiantyn Bokhan. Accelerating recurrent neural network training using sequence bucketing and multi-gpu data parallelization. In IEEE First International Conference on Data Stream Mining & Processing, pages 100–103. IEEE, 2016.
[21] Siu Kwan Lam, Antoine Pitrou, and Stanley Seibert. Numba: A llvm-based python jit compiler. In Proceedings of the second Workshop on the LLVM Compiler Infrastructure in HPC, pages 1–6. ACM, 2015.
[22] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521(7553):436, 2015.
[23] Jun Liu, Amir Shahroudy, Dong Xu, and Gang Wang. Spatio-temporal lstm with trust gates for 3d human action recognition. In European Conference on Computer Vision, pages 816–833. Springer, 2016.
[24] Jun Liu, Gang Wang, Ling-Yu Duan, Kamila Abdiyeva, and Alex C Kot. Skeleton-based human action recognition with global context-aware attention lstm networks. IEEE Transactions on Image Processing, 27(4):1586–1599, 2017.
[25] James Martens and Ilya Sutskever. Learning recurrent neural networks with hessian-free optimization. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 1033–1040. Citeseer, 2011.
[26] Travis Oliphant. Guide to NumPy. 01 2006.
[27] Peng Ouyang, Shouyi Yin, and Shaojun Wei. A fast and power efficient architecture to parallelize lstm based rnn for cognitive intelligence applications. In Proceedings of the 54th Annual Design Automation Conference 2017, pages 1–6. ACM, 2017.
[28] Yoh-Han Pao, Gwang-Hoon Park, and Dejan J Sobajic. Learning and generalization characteristics of the random vector functional-link net. Neurocomputing, 6(2):163–180, 1994.
[29] Jin-Man Park and Jong-Hwan Kim. Online recurrent extreme learning machine and its application to time-series prediction. In 2017 International Joint Conference on Neural Networks (IJCNN), pages 1983–1990. IEEE, 2017.
[30] Yara Rizk and Mariette Awad. On extreme learning machines in sequential and time series prediction: A non-iterative and approximate training algorithm for recurrent neural networks. Neurocomputing, 325:1–19, 2019.
[31] Jürgen Schmidhuber. Deep learning in neural networks: An overview. Neural networks, 61:85–117, 2015.
[32] Wouter F Schmidt, Martin A Kraaijveld, and Robert PW Duin. Feedforward neural networks with random weights. In 11th IAPR International Conference on Pattern Recognition. Vol. II. Conference B: Pattern Recognition Methodology and Systems, pages 1–4. IEEE, 1992.
[33] Xavier Sierra-Canto, Francisco Madera-Ramirez, and Victor Uc-Cetina. Parallel training of a back-propagation neural network using cuda. In 2010 Ninth International Conference on Machine Learning and Applications, pages 307–312. IEEE, 2010.
[34] Zhiyuan Tang, Ying Shi, Dong Wang, Yang Feng, and Shiyue Zhang. Memory visualization for gated recurrent neural networks in speech recognition. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2736–2740. IEEE, 2017.
[35] Hubert AB Te Braake and Gerrit Van Straten. Random activation weight neural net (rawn) for fast non-iterative training. Engineering Applications of Artificial Intelligence, 8(1):71–80, 1995.
[36] Mark Van Heeswijk, Yoan Miche, Erkki Oja, and Amaury Lendasse. Gpu-accelerated and parallelized elm ensembles for large-scale regression. Neurocomputing, 74(16):2430–2437, 2011.
[37] Botao Wang, Shan Huang, Junhao Qiu, Yu Liu, and Guoren Wang. Parallel online sequential extreme learning machine based on mapreduce. Neurocomputing, 149:224–232, 2015.
[38] Shang Wang, Yifan Bai, and Gennady Pekhimenko. Scaling back-propagation by parallel scan algorithm. arXiv preprint arXiv:1907.10134, 2019.
[39] Xiaoyu Wang and Yong Huang. Convergence study in extended kalman filter-based training of recurrent neural networks. IEEE Transactions on Neural Networks, 22(4):588–600, 2011.
[40] Paul J Werbos et al. Backpropagation through time: what it does and how to do it. Proceedings of the IEEE, 78(10):1550–1560, 1990.
[41] Ronald J Williams and David Zipser. Gradient-based learning algorithms for recurrent. Backpropagation: Theory, architectures, and applications, 433, 1995.
[42] Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, et al. Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144, 2016.
[43] Feng Zhang, Jidong Zhai, Marc Snir, Hai Jin, Hironori Kasahara, and Mateo Valero. Guest editorial: Special issue on network and parallel computing for emerging architectures and applications, 2019.
[44] Shunlu Zhang, Pavan Gunupudi, and Qi-Jun Zhang. Parallel back-propagation neural network training technique using cuda on multiple gpus. In IEEE MTT-S International Conference on Numerical Electromagnetic and Multiphysics Modeling and Optimization, pages 1–3. IEEE, 2015.

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-bcb9cabb-8209-4c52-909c-5b4a25dda07b