Hardware-Efficient Structure of the Accelerating Module for Implementation of Convolutional Neural Network Basic Operation

Cariow, A.; Cariowa, G.

Artykuł - szczegóły

Tytuł artykułu

Hardware-Efficient Structure of the Accelerating Module for Implementation of Convolutional Neural Network Basic Operation

Autorzy

Cariow A. , Cariowa G.

Treść / Zawartość

Pełne teksty:

Cariow_Hardware-Efficient_MAM_2_2018.pdf

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

This paper presents a structural design of the hardware-efficient module for implementation of convolution neural network (CNN) basic operation with reduced implementation complexity. For this purpose we utilize some modification of the Winograd’s minimal filtering method as well as computation vectorization principles. This module calculate inner products of two consecutive segments of the original data sequence, formed by a sliding window of length 3, with the elements of a filter impulse response. The fully parallel structure of the module for calculating these two inner products, based on the implementation of a naïve method of calculation, requires 6 binary multipliers and 4 binary adders. The use of the Winograd’s minimal filtering method allows to construct a module structure that requires only 4 binary multipliers and 8 binary adders. Since a high-performance convolutional neural network can contain tens or even hundreds of such modules, such a reduction can have a significant effect.

Słowa kluczowe

convolution neural network Winograd’s minimal filtering algorithm implementation complexity reduction FPGA implementation

Wydawca

Wydawnictwo PAK

Czasopismo

Measurement Automation Monitoring

Rocznik

2018

Tom

Vol. 64, No. 2

Strony

40--42

Opis fizyczny

Bibliogr. 20 poz., rys., wzory

Twórcy

autor

Cariow A.

acariow@wi.zut.edu.pl

West Pomeranian University of Technology, Szczecin, 49 Żołnierska St., 71-210 Szczecin, Poland

autor

Cariowa G.

gcariowa@wi.zut.edu.pl

West Pomeranian University of Technology, Szczecin, 49 Żołnierska St., 71-210 Szczecin, Poland

Bibliografia

[1] Krizhevsky A. Sutskever I. and Hinton G. E.: Imagenet classificationwith deep convolutional neural networks, in Proceedings of the 26thAnnual Conference on Neural Information Processing Systems (NIPS ’12), pp. 1097–1105, Lake Tahoe, Nev, USA, December 2012.
[2] Farabet C., Martini B., Akselrod P., Talay S., LeCun Y. and Culurciello E.: Hardware accelerated convolutional neural networks for synthetic vision systems, in Proceedings of 2010 IEEE International Symposium on Circuits and Systems. 2010, pp. 257–260.
[3] Zhao R., Song W., Zhang W., Xing T., Lin J. H, Srivastava M., Gupta R. and Zhang Z.: Accelerating binarized convolutional neural networks with software-programmable fpgas, in Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017, pp. 15–24.
[4] Zhang C., Li P., Sun G., Guan Y., Xiao B., Cong J.: Optimizing FPGA-based accelerator design for deep convolutional neural networks. Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays: ACM, USA, 2015; 161–170.
[5] Škoda P., Lipić T., Srp Á., Rogina B. M., Skala K. and Vajda F.: Implementation framework for artificial neural networks on FPGA, in 2011 Proceedings of the 34th International Convention MIPRO, May 2011, pp. 274–278.
[6] Cadambi S., Majumdar A., Becchi M., Chakradhar S. and Graf H. P.: A programmable parallel accelerator for learning and classification, in Proceedings of the 19th international conference on Parallel architectures and compilation techniques, ser. PACT ’10. New York, NY, USA:ACM, 2010, pp. 273–284.
[7] Qadeer W., Hameed R., Shacham O., Venkatesan P., Kozyrakis C. and Horowitz M. A.: Convolution engine: balancing efficiency & flexibility in specialized computing, in ACM SIGARCH Computer Architecture News, vol. 41, no. 3. ACM, 2013, pp. 24–35.
[8] Chakradhar S., Sankaradas M., Jakkula V. and Cadambi S.: A dynamically configurable coprocessor for convolutional neural networks. SIGARCH Comput. Archit. News, June 2010, 38(3), pp. 247–257.
[9] Farabet C., Poulet C., Han J. Y., LeCun Y.: CNP: an FPGA-based processor for convolutional networks. FPL 2009.International Conference on Field Programmable Logic and Applications, 2009: IEEE, Prague, Czech Republic, 2009, pp. 32–37.
[10] Farabet C., Martini B., Akselrod P., Talay S., LeCun Y. and Culurciello E.: Hardware accelerated convolutional neural networks for syntheticvision systems, in Circuits and Systems (ISCAS), Proceedings of 2010 IEEE International Symposium, 2010, pp. 257–260.
[11] Ovtcharov K., Ruwase O., Kim J. Y., Fowers J., Strauss K., Chung E. S.: Accelerating deep convolutional neural networks using specialized hardware. Microsoft Research Whitepaper: Microsoft Research, 2015.
[12] Li Y., Liu Z., Xu K., Yu H. and Ren F.: A 7.663-tops 8.2-w energy efficient fpga accelerator for binary convolutional neural networks, arXiv:1702.06392, Feb 2017.
[13] Chen Y. H., Krishna T., Emer J. S. and Sze V.: Eyeriss An Energy- Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks, IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 127–138, 2017.
[14] Qiu J., Wang J., Yao S., Guo K., Li B., Zhou E., Yu J., Tang T., Xu N., Song S., Wang Y., Yang H.: Going Deeper with Embedded FPGA Platform for Convolutional Neural Network,” FPGA ’16, ACM, 2016, pp. 26–35.
[15] Zhang C., Li P., Sun G., Guan Y., Xiao B. and Cong J.: Optimizing fpga-based accelerator design for deep convolutional neural networks, ACM, 2015, pp. 161–170.
[16] Cong J. and Xiao B.: Minimizing computation in convolutional neural networks, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2014, vol. 8681, pp. 281–290.
[17] Lavin A. and Gray S.: Fast algorithms for convolutional neural networks, In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016, pp. 4013–4021.
[18] Li H., Fan X., Jiao L., Cao W., Zhou X. and Wang L.: A high performance FPGA-based accelerator for large-scale convolutional neural networks, in 26th International Conference on Field Programmable Logic and Applications (FPL) 29 Aug.-2 Sept. 2016,Lausanne, Switzerland, pp. 1–9, DOI: 10.1109/FPL.2016.7577308
[19] Qiang Lan, Zelong Wang, Mei Wen, Chunyuan Zhang and Yijie Wang: High Performance Implementation of 3D Convolutional Neural Networks on a GPU. Computational Intelligence and Neuroscience. Hindawi, vol. 2017, pp. 1-8.
[20] Lu L., Liang Y., Xiao Q., Yan S.: Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs, 25th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, 2017, pp. 101-108, DOI: 10.1109/FCCM.2017.64

Uwagi

Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-f0a81e44-f756-487c-aa6c-bcd41ef1e39d