FPGA implementations of low precision floating point multiply-accumulate

Amaricai, A.; Boncalo, O.; Sicoe, O

Artykuł - szczegóły

Tytuł artykułu

FPGA implementations of low precision floating point multiply-accumulate

Autorzy

Amaricai A. , Boncalo O. , Sicoe O

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Floating point (FP) multiply-accumulate (MAC) represents one of the most important operations in a wide range of applications, such as DSP, multimedia or graphic processing. This paper presents a FP MAC half precision (16-bit) FPGA implementation. The main contribution of this work is represented by the utilization of modern FPGA DSP block for performing both mantissa multiplication and mantissa accumulation. In order to use the DSP block for these operations, the alignment right shifts are performed before the multiply-add stage: a right shift on one of the multiplicand, and, a left shift for the other. This results in efficient DSP usage; thus both cost savings and higher performance (high working frequencies and low latencies) are targeted for MAC operations.

Słowa kluczowe

digital arithmetic floating point arithmetic FPGA field programmable gate array (FPGA) multiply-accumulate dot product

arytmetyka cyfrowa arytmetyka zmiennoprzecinkowa FPGA bezpośrednio programowalna macierz bramek multiply-accumulate MAC iloczyn skalarny

Wydawca

Lodz University of Technology. Department of Microelectronics and Computer Science

Czasopismo

International Journal of Microelectronics and Computer Science

Rocznik

2013

Tom

Vol. 4, nr 4

Strony

159--163

Opis fizyczny

Bibliogr. 17 poz.

Twórcy

autor

Amaricai A.

alexandru.amaricai@cs.upt.ro

Department of Computer Engineering, University Politehnica of Timisoara, Timisoara, Romania

autor

Boncalo O.

Department of Computer Engineering, University Politehnica of Timisoara, Timisoara, Romania

autor

Sicoe O

Department of Computer Engineering, University Politehnica of Timisoara, Timisoara, Romania

Bibliografia

[1] A. Amaricai, O. Boncalo, C.E. Gavriliu, “Low Precision DSP Based Floating Point Multiply-Add Fused for FPGAs”, submitted to IET Computing & Digital Techniques, 2013
[2] F. Bensaali, A. Amira, R. Sotudeh, “Floating-point matrix product on FPGA”, Proc. IEEE/ACS Int. Conf. on Computer Systems and Applications, pp. 466-473, 2007
[3] F. de Dinechin, B. Pasca, “Designing custom arithmetic data paths with FloPoCo” IEEE Design and Test of Computers, Vol. 28, Issue 4, pp. 18-27, 2011
[4] F. de Dinechin, B. Pasca, O. Cret, R. Tudoran, “An FPGA-specific approach to ﬂoating-point accumulation and sum-of-products” Proc. 2008 Int. Conf. on Field Programmable Technology (FPT), pp. 33-40, 2008
[5] K.S. Hemmert, K.D. Underwood, “Fast, Efficient Floating-Point Adders and Multipliers for FPGAs”, ACM. Trans. on Reconﬁgurable Technology and Systems (TRETS), Vol. 3, Issue 3, Art. No. 11, 2010
[6] B. Holanda, R. Pimentel, J. Barbosa, R. Camarotti, A. Silva-Filho, L. Joao, V. Souza, J. Ferraz, M. Lima, “An FPGA-Based Accelerator to Speed-Up Matrix Multiplication of Floating Point Operations”, Proc. 2011 IEEE Int. Symp. on Parallel and Distributed Processing Workshops and PhD Forum, pp. 306-309 2011
[7] M. K. Jaiswal, R.C.C. Cheung, “Area-efﬁcient architectures for double precision multiplier on FPGA, with run-time-reconﬁgurable dual single precision support” Microelectronics Journal, Vol. 44, Issue 5, pp. 421-430, 2013
[8] Z. Jovanovic, V. Milutinovic, “FPGA accelerator for ﬂoating-point matrix multiplication” IET Computer and Digital Techniques, Vol. 6, Issue 4, 249-256, 2012
[9] T. Lang, J.D. Bruguera, “Floating-Point Fused Multiply-Add with Reduced Latency” Proc. 2002 IEEE Int. Conf. on Computer Design (ICCD) ,pp.145-150, 2002
[10] M. de Lorimier, A. DeHon, “Floating-Point Sparse Matrix-Vector Multiply for FPGAs” Proc. ACM/SIGDA 13th Int. Symp. On Field Programmable Gate Arrays (FPGA), pp. 75-85, 2005
[11] A.R. Lopes, G. Constantinides, “A fused hybrid ﬂoating-point and ﬁxed-point dot-product for FPGAs” Proc. 6”‘ Int. Conf. on Reconﬁgurable Computing: Architectures, Tools and Applications (ARC’10), pp. 157-168, 2010
[12] V.G. Oklobdzija, “An algorithmic and novel design of a leading Zero detector circuit: Comparison with logic synthesis”, IEEE Trans. On VLSI Systems, Vol.2, Issue 1, pp. 124-128, 1994
[13] A. Paidimari, A. Cevrero, P. Brisk, P. Ienne, “FPGA Implementation of a Single-Precision Floating-Point Multiply-Accumulator with Single-^p[Cycle Accumulation” Proc. 17th IEEE Symp. On Field Programmable Custom Computing Machines (FCCM), pp. 267-270, 2009
[14] S. Sun, J. Zambreno, “A Floating-point Accumulator for FPGA-based High Performance Computing Applications”, Proc. 2009 Int. Conf. on Field Programmable Technology (FPT), pp. 493-499, 2009
[15] Y. Tao, G. Deyuan, D. Xiaoya, J. Nurmi, “Correctly Rounded Architectures for Floating-Point Multi-Operand Addition and Dot-Product Computation”, Proc. 24”‘ IEEE Conf. on Application-Speciﬁc Systems, Architectures and Processors (ASAP), pp. 346-355, 2013
[16] Xilinx, “Virtex-5 FPGA XtremeDSP Design Considerations” - User Guide, 2012
[17] A.M. Zaki, M.H. El-Shafey, A.M.B. Eldin, G.M. Ali, “A New Architecture for Accurate Dot Product of Floating Point Numbers” Proc. 2010 Int. Conf. on Computer Engineering and Systems, pp. 139-145, 2010

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-f44ba2e3-bb1f-4dde-877b-a9d2aa954374