AMD APU systems as a platform for scientific computing

Krużel, F.; Banaś, K.

Artykuł - szczegóły

Tytuł artykułu

AMD APU systems as a platform for scientific computing

Autorzy

Krużel F. , Banaś K.

Wybrane pełne teksty z tego czasopisma

http://www.cmms.agh.edu.pl/

Identyfikatory

Warianty tytułu

Systemy AMD APU jako platforma obliczeń naukowo-technicznych

Języki publikacji

Abstrakty

In our current work we investigate the possibility of using modern AMD APU architecture in scientific and technical computing. The architecture combines both a CPU and a GPU in a single Accelerated Processing Unit, which theoretically allows for shortening the time of exchanging the data between the two hardware units. This capability solves the problem of performance bottleneck related to the exchange of data between the CPU and GPU memory. Due to the structure of this architecture, it can be considered as a natural evolution of the concept presented in the IBM PowerXCell processors that have been tested during our past research (Krużel & Banaś, 2013). As reference systems we use both a system based on similar AMD architecture and a specialized Nvidia Tesla Accelerator card. Moreover, due to comparable characteristics of the CPU and GPU parts of APU we have run our computations on both hardware units separately to see the difference in performance. For testing we used our previously developed finite element numerical integration algorithm implemented in OpenCL programming framework. This algorithm has been tested with various organizations of memory and computing techniques to fully check the hardware capabilities of the APU architecture, both in terms of data exchange and calculations acceleration. Our research brings an answer to the question whether this architecture is the right future for scientific computing and whether in the next few years will be able to play a significant role in many areas of computational science.

W naszej obecnej pracy badamy możliwość wykorzystania nowoczesnej architektury AMD APU do wykonywania obliczeń naukowo-technicznych. Architektura ta łączy w sobie jednostki CPU i GPU w pojedynczym APU (Accelerated Processing Unit), co teoretycznie pozwala na przyspieszenie czasu wymiany danych pomiędzy poszczególnymi jednostkami obliczeniowymi. Możliwość ta rozwiązuje problem „wąskiego gardła", który związany jest z wymianą danych pomiędzy pamięciami CPU i GPU. Ze względu na budowę architekturę tę można uznać za naturalną ewolucję rozwiązania zaprezentowanego w procesorach IBM Power XCell, które były przez nas badane wcześniej (Krużel & Banaś, 2013). W celu porównania uzyskanych wyników użyliśmy zarówno systemu opartego na podobnej architekturze AMD, jak i systemu wyposażonego w specjalistyczną kartę Nvidia Tesla. Ponadto, ze względu na porównywalne cechy CPU i GPU wbudowanych w APU przeprowadziliśmy nasze obliczenia dla każdej z części oddzielnie, aby zobaczyć różnicę pomiędzy obliczeniami na CPU a GPU w tak zintegrowanym układzie. Do testów użyliśmy opracowanego przez nas wcześniej algorytmu całkowania numerycznego zaimplementowanego w środowisku programistycznym OpenCL. Algorytm ten został przetestowany z różnymi opcjami organizacji pamięci i obliczeń, aby w pełni sprawdzić-możliwości sprzętowe architektury APU, zarówno w zakresie wymiany danych, jak i przyśpieszenia obliczeń. Wynikiem pozytywnych rezultatów naszych badań jest stwierdzenie, że nowoczesne architektury AMD APU są przyszłościowe w kontekście obliczeń naukowych i w następnych latach będą mogły odgrywać znaczącą rolę w dziedzinie przyspieszania obliczeń.

Słowa kluczowe

AMD Accelerated Processing Unit APU OpenCL heterogeneous system architecture finite element method numerical integration

Wydawca

Wydawnictwa AGH

Czasopismo

Computer Methods in Materials Science

Rocznik

2015

Tom

Vol. 15, No.2

Strony

362--369

Opis fizyczny

Bibliogr. 21 poz., rys.

Twórcy

autor

Krużel F.

fkruzel@pk. edu.pl

Cracow University of Technology, Warszawska 24, 31-155 Kraków, Poland

autor

Banaś K.

AGH University of Science and Technology, al. Mickiewicza 30, 30-059, Kraków

Bibliografia

AMD, AMD Developer Summit 2013, San Jose Convention Center.
Banaś, K., Krużel, F., 2014, OpenCL Performance Portability for Xeon Phi Coprocessor and NVIDIA GPUs: A Case Study of Finite Element Numerical Integration, Lecture Notes in Computer Science, 8806, 158-169.
Banaś, K., Płaszewski, P., Maciol, P., 2014, Numerical integration on GPUs for higher order finite elements, Computers & Mathematics with Applications, 67 (6), 1319-1344.
Barker, K. J., Davis K., Hoisie, A., Kerbyson, D. K., Lang, M., Pakin, S., Sancho J. C, 2008, Entering the petaflop era: The architecture and performance of Roadrunner, High Performance Computing, Networking, Storage and Analysis, 1-11.
Gaster, B., Kaeli, D., Howes, L., Mistry, P., Schaa, D., 2011, Heterogeneous Computing With OpenCL, Elsevier Science & Technology.
Graczyk, R., Intel Iris Pro 5200 - test; Crysis 3 na integrze?, available online at: http://pclab.pl/art54267.html, PClab.pl digital community, 2013, [Accessed February 10, 2015].
Halfacree, G., , 2013, AMD announces Heterogeneous Queuing tech, available online at: bit-tech http://www.bittech.net/ news/hardware/2013/10/22/amdhq/1 ,bit-tech [Accessed February 10, 2015].
Heirich, A., Bavoil L., 2006, Deferred Pixel Shading on the PLAYSTATION 3, available online at: http:// research, scea.com/ps3_deferred_shading.pdf, Sony Computer Entertainment US Research & Development, [Accessed February 10, 2015].
Howes, L., Munshi, A., 2014, The OpenCL Specification, Khronos OpenCLWorking Group, version 2.0, revision 26.
HSA Foundation, [online] http://www.hsafoundation.com, 2013,
[Accessed February 10, 2015]. Intel, OpenCL 2.0 Shared Virtual Memory Overview, Intel, 2014.
Krużel, F., Banaś, K., 2010, Finite element numerical integration on PowerXCell processors, Lecture Notes in Computer Science, 6067, 517-524.
Krużel, F., Banaś, K., 2014, Finite Element Numerical Integration on Xeon Phi coprocessor, Annals of Computer Science and Information Systems, 2, 603-612.
Krużel, F., Banaś, K., 2013, Vectorized OpenCL implementation of numerical integration for higher order finite elements, Computers & Mathematics with Applications, 66(10), 2030-2044.
Kyriazis, G, 2012, Heterogeneous System Architecture: A Technical Review, AMD, revision 1.0.
Landaverde, R., Zhang, T, Coskun, A.K., Herbordt, M., 2014, An Investigation of Unified Memory Access Performance in CUDA, IEEE High Performance Extreme Computing.
Michalik, K., Banaś, K., Plaszewski, P., Cybulka, P., ModFem 2013, A computational framework for parallel adaptive finite element simulations, Computer Methods in Materials Science, 13 (1), 3-8.
NVIDIA, CUDA C Programming Guide Design Guide, version 6.5, August 2014.
NVIDIA, NVIDIA NVLink High-Speed Interconnect: Application Performance, Whitepaper, 2013.
Rul, S., Vandierendonck, H., D' Haene J., De Bosschere, K., 2010, An experimental study on performance portability of OpenCL kernels, Application Accelerators in High Performance Computing, 2010 Symposium, Knoxville, TN, USA, 3.
Van Winkle, W., 2012, AMD Fusion: How It Started, Where It's Going, And What It Means, available online at: http://www.tomshardware.com/reviews/fusionhsa-opencl-history,3262-12.html, tom's Hardware, [Accessed February 10, 2015].

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-fcfae3d3-bcb2-4d08-a738-e870a55ce388