PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Studying OpenMP thread mapping for parallel linear algebra kernels on multicore system

Autorzy
Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Thread mapping is one of the techniques which allow for efficient exploiting of the potential of modern multicore architectures. The aim of this paper is to study the impact of thread mapping on the computing performance, the scalability, and the energy consumption for parallel dense linear algebra kernels on hierarchical shared memory multicore systems. We consider the basic application, namely a matrix-matrix product (GEMM), and two parallel matrix decompositions (LU and WZ). Both factorizations exploit parallel BLAS (basic linear algebra subprograms) operations, among others GEMM. We compare differences between various thread mapping strategies for these applications. Our results show that the choice of thread mapping has the measurable impact on the performance, the scalability, and energy consumption of the GEMM and two matrix factorizations.
Rocznik
Strony
981--990
Opis fizyczny
Bibliogr. 17 poz., rys., wykr., tab.
Twórcy
autor
  • Marie Curie-Skłodowska University, Institute of Mathematics, Pl. M. Curie-Skłodowskiej 5, 20-031 Lublin, Poland
autor
  • Marie Curie-Skłodowska University, Institute of Mathematics, Pl. M. Curie-Skłodowskiej 5, 20-031 Lublin, Poland
Bibliografia
  • [1] E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen: LAPACK Users’ Guide, Society for Industrial and Applied Mathematics, Philadelphia, PA, Third Edition, 1999.
  • [2] A. Buttari, J. Langou, J. Kurzak, and J. Dongarra, “A class of parallel tiled linear algebra algorithms for multicore architectures”, Parallel Computing, 35 (1), 38–53 (2009).
  • [3] B. Bylina, “The Block WZ factorization”, Journal of Computational and Applied Mathematics 331, 119–132 (2018).
  • [4] B. Bylina and J. Bylina, “Incomplete WZ factorization as an alternative method of preconditioning for solving Markov chains”, PPAM, volume 4967 of Lecture Notes in Computer Science, 99–107 (2007).
  • [5] B. Bylina and J. Bylina, “Influence of preconditioning and blocking on accuracy in solving Markovian models”, Applied Mathematics and Computer Science, 19 (2), 207–217 (2009).
  • [6] B. Bylina and J. Bylina “OpenMP thread affinity for matrix factorization on multicore systems”, Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, volume 11 of Annals of Computer Science and Information Systems, 489–492 (2017).
  • [7] S. Chandra Sekhara Rao, “Existence and uniqueness of WZ factorization”, Parallel Computing, 23 (8), 1129–1139 (1997).
  • [8] M. Diener, E. H. M. Cruz, M. A. Z. Alves, P. O. A.Navaux, and I. Koren “Affinity-based thread and data mapping in shared memory systems”, ACM Comput. Surv., 49 (4), 64:1–64:38 (Dec. 2016).
  • [9] J. Dongarra, J. DuCroz, I. S. Duff, and S. Hammarling, “A set of level-3 Basic Linear Algebra Subprograms”, ACM Trans. Math. Software, 16, 1–28 (1990).
  • [10] J. Dongarra, H. Ltaief, P. Luszczek, and V. M.Weaver, “Energy footprint of advanced dense numerical linear algebra using tile algorithms on multicore architectures”, 2012 Second International Conference on Cloud and Green Computing, 274–281 (Nov. 2012).
  • [11] D. J. Evans and M. Hatzopoulos, “A parallel linear system solver”, International Journal of Computer Mathematics, 7 (3), 227–238 (1979).
  • [12] M. J. Flynn. “Some computer organizations and their effectiveness”, IEEE Trans. Comput., 21 (9), 948–960 (Sep. 1972).
  • [13] E. Rotem, A. Naveh, A. Ananthakrishnan, E. Weissmann, and D. Rajwan, “Power-management architecture of the intel microarchitecture code-named sandy bridge”, IEEE Micro, 32 (2), 20–27 (Mar. 2012).
  • [14] M. Weiland and N. Johnson, “Benchmarking for power consumption monitoring”, Computer Science – Research and Development, 30 (2), 155–163 (May 2015).
  • [15] P. Yalamov and D. J. Evans, “The WZ matrix factorisation method”, Parallel Computing, 21 (7), 1111–1120 (1995).
  • [16] Intel Math Kernel Library, 2014. http://software.intel.com/en-us/articles/intel-mkl/
  • [17] OpenMP Architecture Review Board: OpenMP application program interface version 4.5, May 2015.
Uwagi
PL
Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-c600b1ad-cff0-461b-9b40-86ee18ad60f5
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.