Studying OpenMP thread mapping for parallel linear algebra kernels on multicore system

Bylina, B.; Bylina, J.

doi:10.24425/bpas.2018.125800

Artykuł - szczegóły

Tytuł artykułu

Studying OpenMP thread mapping for parallel linear algebra kernels on multicore system

Autorzy

Bylina B. , Bylina J.

Treść / Zawartość

Pełne teksty:

24_981-990_00734_Bpast.No.66-6_31.12.18_K2.pdf

Pobierz

Identyfikatory

DOI

10.24425/bpas.2018.125800

Warianty tytułu

Języki publikacji

Abstrakty

Thread mapping is one of the techniques which allow for efficient exploiting of the potential of modern multicore architectures. The aim of this paper is to study the impact of thread mapping on the computing performance, the scalability, and the energy consumption for parallel dense linear algebra kernels on hierarchical shared memory multicore systems. We consider the basic application, namely a matrix-matrix product (GEMM), and two parallel matrix decompositions (LU and WZ). Both factorizations exploit parallel BLAS (basic linear algebra subprograms) operations, among others GEMM. We compare differences between various thread mapping strategies for these applications. Our results show that the choice of thread mapping has the measurable impact on the performance, the scalability, and energy consumption of the GEMM and two matrix factorizations.

Słowa kluczowe

computation performance OpenMP standard nonnegative matrix factorization thread mapping energy consumption

wydajność obliczeniowa Standard OpenMP nieujemna faktoryzacja macierzy mapowanie zużycie energii

Wydawca

Polska Akademia Nauk, Wydział IV Nauk Technicznych

Czasopismo

Bulletin of the Polish Academy of Sciences. Technical Sciences

Rocznik

2018

Tom

Vol. 66, nr 6

Strony

981--990

Opis fizyczny

Bibliogr. 17 poz., rys., wykr., tab.

Twórcy

autor

Bylina B.

beata.bylina@umcs.pl

Marie Curie-Skłodowska University, Institute of Mathematics, Pl. M. Curie-Skłodowskiej 5, 20-031 Lublin, Poland

autor

Bylina J.

Marie Curie-Skłodowska University, Institute of Mathematics, Pl. M. Curie-Skłodowskiej 5, 20-031 Lublin, Poland

Bibliografia

[1] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen: LAPACK Users’ Guide, Society for Industrial and Applied Mathematics, Philadelphia, PA, Third Edition, 1999.
[2] A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, “A class of parallel tiled linear algebra algorithms for multicore architectures”, Parallel Computing, 35 (1), 38–53 (2009).
[3] B. Bylina, “The Block WZ factorization”, Journal of Computational and Applied Mathematics 331, 119–132 (2018).
[4] B. Bylina and J. Bylina, “Incomplete WZ factorization as an alternative method of preconditioning for solving Markov chains”, PPAM, volume 4967 of Lecture Notes in Computer Science, 99–107 (2007).
[5] B. Bylina and J. Bylina, “Influence of preconditioning and blocking on accuracy in solving Markovian models”, Applied Mathematics and Computer Science, 19 (2), 207–217 (2009).
[6] B. Bylina and J. Bylina “OpenMP thread affinity for matrix factorization on multicore systems”, Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, volume 11 of Annals of Computer Science and Information Systems, 489–492 (2017).
[7] S. Chandra Sekhara Rao, “Existence and uniqueness of WZ factorization”, Parallel Computing, 23 (8), 1129–1139 (1997).
[8] M. Diener, E. H. M. Cruz, M. A. Z. Alves, P. O. A.Navaux, and I. Koren “Affinity-based thread and data mapping in shared memory systems”, ACM Comput. Surv., 49 (4), 64:1–64:38 (Dec. 2016).
[9] J. Dongarra, J. DuCroz, I. S. Duff, and S. Hammarling, “A set of level-3 Basic Linear Algebra Subprograms”, ACM Trans. Math. Software, 16, 1–28 (1990).
[10] J. Dongarra, H. Ltaief, P. Luszczek, and V. M.Weaver, “Energy footprint of advanced dense numerical linear algebra using tile algorithms on multicore architectures”, 2012 Second International Conference on Cloud and Green Computing, 274–281 (Nov. 2012).
[11] D. J. Evans and M. Hatzopoulos, “A parallel linear system solver”, International Journal of Computer Mathematics, 7 (3), 227–238 (1979).
[12] M. J. Flynn. “Some computer organizations and their effectiveness”, IEEE Trans. Comput., 21 (9), 948–960 (Sep. 1972).
[13] E. Rotem, A. Naveh, A. Ananthakrishnan, E. Weissmann, and D. Rajwan, “Power-management architecture of the intel microarchitecture code-named sandy bridge”, IEEE Micro, 32 (2), 20–27 (Mar. 2012).
[14] M. Weiland and N. Johnson, “Benchmarking for power consumption monitoring”, Computer Science – Research and Development, 30 (2), 155–163 (May 2015).
[15] P. Yalamov and D. J. Evans, “The WZ matrix factorisation method”, Parallel Computing, 21 (7), 1111–1120 (1995).
[16] Intel Math Kernel Library, 2014. http://software.intel.com/en-us/articles/intel-mkl/
[17] OpenMP Architecture Review Board: OpenMP application program interface version 4.5, May 2015.

Uwagi

Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-c600b1ad-cff0-461b-9b40-86ee18ad60f5