Tuning matrix-vector multiplication on GPU

Dziekoński, A.; Mrozowski, M.

Artykuł - szczegóły

Tytuł artykułu

Tuning matrix-vector multiplication on GPU

Autorzy

Dziekoński A. , Mrozowski M.

Identyfikatory

Warianty tytułu

Dostosowanie mnożenia macierzy przez wektor do wykonania tej operacji na GPU

Języki publikacji

Abstrakty

A matrix times vector multiplication (matvec) is a cornerstone operation in iterative methods of solving large sparse systems of equations such as the conjugate gradients method (cg), the minimal residual method (minres), the generalized residual method (gmres) and exerts an influence on overall performance of those methods. An implementation of matvec is particularly demanding when one executes computations on a GPU (Graphics Processing Unit), because using this device one has to comply with certain programming rules in order to take advantage of parallel computing. In this paper, it will be shown how to modify the sparse matrix-vector multiplication based on CRS (Compressed Row Storage) to achieve about 3-5 times better performance on a low cost GPU (GeForce GTX 285, 1.48 GHz) than on a CPU (Intel Core i7, 2.67GHz).

Mnożenia macierzy przez wektor jest kluczową operacją metod iteracyjnych (tj. metoda gradientów sprzężonych, MINRES, GMRES), które mają za zadanie rozwiązać duże rzadkie układy równań, gdyż wykonanie tej operacji wpływa na całościowe wykonanie w/w metod. Ponadto, chcąc zaimplementować tę operację na GPU należy przestrzegać dość restrykcyjnych zasad wynikających ze specyfiki architektury dedykowanej akceleratorom graficznym. W tej publikacji przedstawionych zostanie kilka modyfikacji operacji mnożenia macierzy rzadkiej przez wektor przy użyciu kompresji CRS (Compressed Row Storage) na GPU oraz porównane zostaną czasy wykonań uzyskane na GPU (GeForce GTX 285, 1.48 GHz) i na CPU (Intel Core i7, 2.67GHz).

Słowa kluczowe

matrix-vector multiplication GPU

mnożenie macierzy GPU

Wydawca

Wydział Elektroniki, Telekomunikacji i Informatyki Politechniki Gdańskiej

Czasopismo

Zeszyty Naukowe Wydziału ETI Politechniki Gdańskiej. Technologie Informacyjne

Rocznik

2010

Tom

T. 18

Strony

307--312

Opis fizyczny

Bibliogr. 12 poz., rys., tab.

Twórcy

autor

Dziekoński A.

autor

Mrozowski M.

Gdansk University of Technology Department of Microwave and Antenna Engineering

Bibliografia

[1] Taflove A.: Computational electrodynamics FDTD method, 2nd ed., Norwood: Artech House, 2000.
[2] Sadiku M. O.: Numerical Techniques in Electromagnetics, 2nd ed. CRC Press, 2001.
[3] Saad Y.: Iterative Methods for Sparse Linear Systems. Boston: SIAM, 2003.
[4] Bai Z., Demmel J., Dongarra J., Ruhe A. and Vorst H.: Templates for the Solution of Sparse Eigenvalue Problems. Philadelphia: SIAM, 2000.
[5] http://gpgpu.org
[6] http://www.gpucomputing.eu
[7] Bell N. and Garland M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors, SC ’09: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, 2009.
[8] Bell N. and Garland M.: Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004, NVIDIA Corporation, Tech. Rep., Dec. 2008.
[9] Baskaran M. M. and Bordawekar R.: Optimizing Sparse Matrix-Vector Multiplication on GPUs Using Compile-time and Run-time Strategies IBM Research Report RC24704, IBM, Apr. 2009, Tech. Rep., 2009.
[10] Vázquez F., Garzón E. M., Martínez J. A., Fernández J. J.: The sparse matrix vector product on GPUs, Proceedings of the 2009 International Conference on Computational and Mathematical Methods in Science and Engineering, volume 2, pages 10811092. CMMSE, Gijón (Spain), July 2009.
[11] Programming Guide Version 2.1 Nvidia Co. 2008.
[12] Dziekonski A., Sypek P., Kulas L. and Mrozowski M.: Implementation of matrix-type FDTD algorithm on a graphics accelerator. Microwaves, Radar and Wireless Communications, MIKON, 2008.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BPG8-0033-0048