Data management in CUDA Programming for High Bandwidth Memory in GPU Accelerators

Korpała, G.

Artykuł - szczegóły

Tytuł artykułu

Data management in CUDA Programming for High Bandwidth Memory in GPU Accelerators

Autorzy

Korpała G.

Wybrane pełne teksty z tego czasopisma

http://www.cmms.agh.edu.pl/

Identyfikatory

Warianty tytułu

Zarządzanie pamięcią w programowaniu CUDA dla osiągnięcia wysokiej przepustowości pamięci w akceleratorach GPU

Języki publikacji

Abstrakty

The new High Bandwidth Memory 2 (HBM 2) built into the Tesla P100 enables speedier calculations without much effort. HBM 2 by P100 has a max. bandwidth of 720 GB/s which is lower than the bandwidth of the GPU cache and Shared Memory (SMem) of Kepler-GPU which are almost 2.500 GB/s in size (Woolley, 2013). For Kepler-GPU architecture it is popular to shift data to the SMem and decrease the computation time by reduction of access number to VRAM. In new GPUs like Maxwell and Pascal with much higher band¬width it is questionable if use of SMem in this architecture gives large increase of performance. This publication will explain how data management between Video-RAM (VRAM) and the GPU processor must look like in order to be able to utilize the full calculation pow¬er of the GPU (depending of GPU architecture) by simple models for a three-dimensional calculation.

Nowa pamięć wysokiej przepustowości (HBM 2) wykorzystywana w karcie Tesla P100 umożliwia znaczne przyspieszenie obliczeń. Pamięć HBM 2 zastosowana w modelu P100 pozwala na transfer danych z przepustowością 720 GB/s, co jest ciągle mniejszą wartością niż prędkości oferowane przez pamięć podręczną i współdzieloną (SMem) procesorów GPU należących do architektury Kepler, których wartości osiągają poziom 2,500 GB/s (Woolley, 2013). Popularnym podejściem stosowanym w celu skrócenia czasu obliczeń w architekturze Kepler jest zastosowanie przesunięcia danych do pamięci SMem w celu zredukowania ilości dostępów do VRAM. W nowych procesorów graficznych takich jak Maxwell i Pascal oferujących znacznie wyższą przepustowość pamięci wątpliwości poddaje się sens wykorzystania SMem do osiągnięcia wzrostu wydajności. W publikacji wyjaśniono sposób zarządzania pamięcią Video-RAM (VRAM) i procesora w celu pełnego wykorzystania mocy obliczeniowej GPU (w zależności od architektury) na podstawie prostych modeli i twójwymiarowych obliczeń.

Słowa kluczowe

GPU computation shared memory memory management

Wydawca

Wydawnictwa AGH

Czasopismo

Computer Methods in Materials Science

Rocznik

2016

Tom

Vol. 16, No. 3

Strony

121--126

Opis fizyczny

Bibliogr. 6 poz., rys.

Twórcy

autor

Korpała G.

Grzegorz. Korpala@imf.tu-freiberg, de

Institute of Metal Forming, Technische Universität Bergakademie Freiberg Bernhard v. Cotta Str. 4, 09599 Freiberg, Germany

Bibliografia

Ferenc, M. Ferenc, 1., Robert, M., Islvati, I.., 201 I, Simulation of reaction diffusion processes in three dimensions using CUDA. Analytical Platforms for Providing and Handling Massive Chemical Data, 76-85.
Korpala, G., Kawalla, R., 2015, Optimization and application of GPU calculations in material science. Computer Methods in Materials Science. 15(1), 185-191.
Nvidia. 2016, Whitepaper NVIDIA Tesla PI00. Available online at:https://images.nvidia.com/content/pdf/tesla/ whitepaper/pascal-architecturcwhitepaper. pdf, accessed:15.1 1.2016
Sanders, J., Kandrot, E., 2010, An Introduction to General- Purpose GPU Programming. In .1. Sanders, & E. Kandrot, CUDA by Example.
Wolfram, S., 2015, Wolfram Language Dokumcntation Center Mathematica 10.3. Wolfram.
Woolley, C., 2013, GPU Optimization Fundamentals. Available online at: http: https://www.olcf.ornI.gov/wp-eontent/uploads/2013/02/ GPU Opt_Fund-CWI.pdf. accessed: 15.11.2016.

Uwagi

Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę (zadania 2017).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-0443d83b-b415-4e15-9447-78dc55728b9c