Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
Field-programmable gate arrays (FPGA) technology can offer significantly higher performance at much lower power consumption than is available from single and multicore CPUs and GPUs (graphics processing unit) in many computational problems. Unfortunately, the pure programming for FPGA using hardware description languages (HDL), like VHDL or Verilog, is a difficult and not-trivial task and is not intuitive for C/C++/Java programmers. To bring the gap between programming effectiveness and difficulty, the high level synthesis (HLS) approach is promoted by main FPGA vendors. Nowadays, time-intensive calculations are mainly performed on GPU/CPU architectures, but can also be successfully performed using HLS approach. In the paper we implement a bandwidth selection algorithm for kernel density estimation (KDE) using HLS and show techniques which were used to optimize the final FPGA implementation. We are also going to show that FPGA speedups, comparing to highly optimized CPU and GPU implementations, are quite substantial. Moreover, power consumption for FPGA devices is usually much less than typical power consumption of the present CPUs and GPUs.
Rocznik
Tom
Strony
821--829
Opis fizyczny
Bibliogr. 30 poz., rys., tab., wykr.
Twórcy
autor
- Institute of Control and Computation Engineering, University of Zielona Góra, Licealna 9 St., 65-417 Zielona Góra, Poland
autor
- Institute of Control and Computation Engineering, University of Zielona Góra, Licealna 9 St., 65-417 Zielona Góra, Poland
autor
- Computer Center, University of Zielona Góra, Licealna 9 St., 65-417 Zielona Góra, Poland
Bibliografia
- [1] W. Andrzejewski, A. Gramacki and J. Gramacki, “Graphics processing units in acceleration of bandwidth selection for kernel density estimation”, Int. J. Appl. Math. Comput. Sci. 23(4), 869–885 (2013).
- [2] J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avizienis, J. Wawrzynek and K. Asanovi, “Chisel: constructing hardware in a scala embedded language”, Design Automation Conference IEEE, 1212–1221 (2012).
- [3] J. E. Chacón and T. Duong, “Multivariate plug-in bandwidth selection with unconstrained pilot bandwidth matrices”, TEST (Springer) 19(2), 375–398 (2010).
- [4] J. E. Chacón and T. Duong, “Unconstrained pilot selectors for smoothed cross validation”, Australian & New Zealand Journal of Statistics 53, 331–351 (2011).
- [5] J. E. Chacón and T. Duong, “Efficient recursive algorithms for functionals based on higher order derivatives of the multivariate Gaussian density”, Statistics and Computing 25, 959–974 (2015).
- [6] J. E. Volder, “The CORDIC trigonometric computing technique”, IRE Transactions on Electronic Computers EC-8, 330–334, (1959).
- [7] J. S. Walther, “A unified algorithm for elementary functions”, Proc. of Spring Joint Computer Conference, 379–385 (1971).
- [8] P. Coussy and A. Morawiec, High-Level Synthesis From Algorithm to Digital Circuit, Springer, Heidelberg (2008).
- [9] N. Daili and A. Guesmia, “Remez algorithm applied to the best uniform polynomial approximations”, Gen. Math. Notes 17(1), 16–31 (2013).
- [10] S. A. Fahmy and A. R. Mohan, “Architecture for real-time nonparametric probability density function estimation”, IEEE Transactions on Very Large Scale Integration (VLSI) Systems 21(5), 910–920 (2013).
- [11] I. Grobelna, R. Wiśniewski, M. Grobelny and M. Wiśniewska, “Design and verification of real-life processes with application of Petri nets”, IEEE Transactions on Systems, Man, and Cybernetics: Systems PP(99), 1–14, DOI: http://dx.doi.org/10.1109/TSMC.2016.2531673 (2016).
- [12] M. C. Jones, J. S. Marron and S. J. Sheather, “A brief survey of bandwidth selection for density estimation”, Journal of the American Statistical Association 91(433), 401–407 (1996).
- [13] P. Kulczycki, Kernel Estimators in Systems Analysis, Wydawnictwo Naukowo-Techniczne, Warsaw, 2005 [in Polish].
- [14] P. Kulczycki and M. Charytanowicz, “A complete gradient clustering algorithm formed with kernel estimators”, Int. J. Appl. Math. Comput. Sci. 20(1), 123–134 (2010).
- [15] Y. Lei, Y. Dou, Y. Dong, J. Zhou and F. Xia, “FPGA implementation of an exact dot product and its application in variableprecision floating-point arithmetic”, J. Supercomput. 64(2), 580–605 (2013).
- [16] J. Matai, D. Richmond, D. Leey and R. Kastner, “Enabling FPGAs for the masses”, 1st Int. Workshop on FPGAs for Software Programmers, Munich, arXiv:1408.5870 (2014).
- [17] E. P. Ferlin, H. S. Lopes, C. R. Erig Lima and M. Perretto, “PRADA: a high-performance reconfigurable parallel architecture based on the dataflow model”, Int. J. of High Performance Systems Architecture 3(1), 41–55 (2011).
- [18] A. Pułka and A. Milik, “An efficient hardware implementation of smith-waterman algorithm based on the incremental approach”, International Journal of Electronics and Telecommunications 57(4), 489–496 (2011).
- [19] E. Y. Remez, “Sur la détermination des polynômes d’approximation de degré donnée”, Comm. Soc. Math. Kharkov 10, 41–63 (1934) [in French].
- [20] M. Sawerwain and R. Gielerak, “GPGPU based simulations for one and two dimensional quantum walks”, Computer Networks: 17th Conference, Ustroń, 29–38 (2010).
- [21] D.W. Scott, Multivariate Density Estimation: Theory, Practice, and Visualization, John Wiley & Sons, Inc. (1992).
- [22] B. W. Silverman, Density Estimation For Statistics And Data Analysis, Chapman & Hall (1986).
- [23] J. S. Simonoff, Smoothing Methods in Statistics, Springer, 1996.
- [24] J. Spiechowicz, M. Kostur and L. Machura, “GPU accelerated Monte Carlo simulation of Brownian motors dynamics with CUDA”, Computer Physics Communications 191, 140–149 (2015).
- [25] P. Steffen, R. Giegerich and M. Giraud, “GPU parallelization of algebraic dynamic programming”, PPAM 2009, LNCS 6068, 290–299 (2010).
- [26] “Synflow Cx”, www.synflow.com, last access April 2015.
- [27] S. Taherkhani, E. Ever and O. Gemikonakli, “Implementation of non-pipelined and pipelined data encryption standard (DES) using Xilinx Virtex-6 FPGA technology”, IEEE 10th International Conference on Computer and Information Technology, 1257–1262, (2010).
- [28] M. P.W and and M. C. Jones, Kernel Smoothing, Chapman & Hall (1995).
- [29] B. Wyrwoł and E. Hryniewicz, “Decomposition of the fuzzy inference system for implementation in the FPGA structure”, Int. J. Appl. Math. Comput. Sci. 23(2), 473–483 (2013).
- [30] “The PLUGIN source codes”, https://github.com/qMSUZ/plugin (2016).
Uwagi
PL
Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-8f1ab760-1f67-46d6-a0e2-6078b67c0c65