Evaluation of Selected Resource Allocation and Scheduling Methods in Heterogeneous Many-Core Processors and Graphics Processing Units

Ciznicki, M; Kurowski, K.; Węglarz, J.

doi:10.2478/fcds-2014-0013

Artykuł - szczegóły

Tytuł artykułu

Evaluation of Selected Resource Allocation and Scheduling Methods in Heterogeneous Many-Core Processors and Graphics Processing Units

Autorzy

Ciznicki M , Kurowski K. , Węglarz J.

Wybrane pełne teksty z tego czasopisma

Identyfikatory

DOI

10.2478/fcds-2014-0013

Warianty tytułu

Języki publikacji

Abstrakty

Heterogeneous many-core computing resources are increasingly popular among users due to their improved performance over homogeneous systems. Many developers have realized that heterogeneous systems, e.g. a combination of a shared memory multi-core CPU machine with massively parallel Graphics Processing Units (GPUs), can provide significant performance opportunities to a wide range of applications. However, the best overall performance can only be achieved if application tasks are efficiently assigned to different types of processor units in time taking into account their specific resource requirements. Additionally, one should note that available heterogeneous resources have been designed as general purpose units, however, with many built-in features accelerating specific application operations. In other words, the same algorithm or application functionality can be implemented as a different task for CPU or GPU. Nevertheless, from the perspective of various evaluation criteria, e.g. the total execution time or energy consumption, we may observe completely different results. Therefore, as tasks can be scheduled and managed in many alternative ways on both many-core CPUs or GPUs and consequently have a huge impact on the overall computing resources performance, there are needs for new and improved resource management techniques. In this paper we discuss results achieved during experimental performance studies of selected task scheduling methods in heterogeneous computing systems. Additionally, we present a new architecture for resource allocation and task scheduling library which provides a generic application programming interface at the operating system level for improving scheduling polices taking into account a diversity of tasks and heterogeneous computing resources characteristics.

Słowa kluczowe

scheduling resource management GPUs many-core computing systems

Wydawca

Wydawnictwo Politechniki Poznańskiej

Czasopismo

Foundations of Computing and Decision Sciences

Rocznik

2014

Tom

Vol. 39, No. 4

Strony

233--248

Opis fizyczny

28 poz., rys.

Twórcy

autor

Ciznicki M

miloszc@man.poznan.pl

Poznań Supercomputing and Networking Center, Poland
Institute of Computing Science, Poznań University of Technology, Poland

autor

Kurowski K.

Poznań Supercomputing and Networking Center, Poland

autor

Węglarz J.

Poznań Supercomputing and Networking Center, Poland
Institute of Computing Science, Poznań University of Technology, Poland

Bibliografia

[1] (2014). Specification of the zeus cluster, http://www.top500.org/system/177388.
[2] Agullo, E., Demmel, J., Dongarra, J., Hadri, B., Kurzak, J., and Langou, J. (2009). Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects. Journal of Physics: Conference Series, 180:12-37.
[3] Ali, S., Siegel, H., Maheswaran, M., Hensgen, D., and Ali, S. (2000). Representing task and machine heterogeneities for heterogeneous computing systems. Tamkang Journal of Science and Engineering, 3(3):195-208.
[4] Arora, N., Blumofe, R., and Plaxton, C. (1998). Thread scheduling for multipro- grammed multiprocessors. In Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures, pages 119-129. ACM.
[5] Augonnet, C., Thibault, S., and Namyst, R. (2010). StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines.
[6] Ayguadé, E., Badia, R., Igual, F., Labarta, J., Mayo, R., and Quintana-Ortí, E. (2009). An extension of the StarSs programming model for platforms with multiple GPUs. Euro-Par 2009 Parallel Processing, pages 851-862.
[7] Blazewicz, M., Brandt, S., Kierzynka, M., Kurowski, K., Ludwiczak, B., Tao, J., and Weglarz, J. (2011). CaKernel - A parallel application programming framework for heterogenous computing architectures. Scientific Programming, 4:185-197.
[8] Blazewicz, M., Hinder, I., Koppelman, D., Brandt, S., Ciznicki, M., Kierzynka, M., Loffer, F., Schnetter, E., and Tao, J. (2013). From physics model to results: An optimizing framework for cross-architecture code generation. Scientific Pro- gramming, 21(1):1-16.
[9] Chapman, B., Jost, G., and Van Der Pas, R. (2008). Using OpenMP: portable shared memory parallel programming, volume 10. MIT press.
[10] Ciznicki, M., Kierzynka, M., Kopta, P., Kurowski, K., and Gepner, P. (2012a). Benchmarking data and compute intensive applications on modern CPU and GPU architectures. In Procedia Computer Science 9, volume 9, pages 1900-1909.
[11] Ciznicki, M., Kierzynka, M., Kurowski, K., Ludwiczak, B., Napierala, K., and Placzynski, J. (2012b). Efficient isosurface extraction using marching tetrahedra and histogram pyramids on multiple GPUs. In Parallel Processing and Applied Mathematics, pages 343-352. Springer Berlin Heidelberg.
[12] Ciznicki, M., Kopta, P., Kulczewski, M., Kurowski, K., and Gepner, P. (2014). Elliptic solver performance evaluation on modern hardware architectures. In Paral- lel Processing and Applied Mathematics, pages 155-165. Springer Berlin Heidelberg.
[13] Diamos, G. and Yalamanchili, S. (2008). Harmony: an execution model and run- time for heterogeneous many core systems. In Proceedings of the 17th international symposium on High performance distributed computing, pages 197-200. ACM.
[14] Gropp, W., Lusk, W., and Skjellum, A. (1999). Using MPI: portable parallel programming with the message-passing interface, volume 1. MIT press.
[15] Kamil, S., Chan, C., Oliker, L., Shalf, J., and Williams, S. (2010). An auto tuning framework for parallel multicore stencil computations. Parallel & Distributed Processing, pages 1-12.
[16] Kurowski, K., Oleksiak, A., and Weglarz, J. (2013). Multicriteria, multi-user scheduling in grids with advance reservation. Journal of Scheduling, 13 (5):493-508.
[17] Lee, S., Min, S. J., and Eigenmann, R. (2009). OpenMP to GPGPU: a com piler framework for automatic translation and optimization. ACM Sigplan Notices, 44.4:101-110.
[18] Linderman, M., Collins, J., Wang, H., and Meng, T. (2008). Merge: a programming model for heterogeneous multi-core systems. ACM SIGOPS operating systems review, 42.
[19] Nickolls, J., Buck, I., Garland, M., and Skadron, K. (2008). Scalable parallel programming with cuda. Queue, 2:40-53.
[20] Shoukat, M., Maheswaran, M., Ali, S., Siegel, H., Hensgen, D., and Freund, R. (1999). Dynamic mapping of a class of independent tasks onto heterogeneous computing systems. In Journal of Parallel and Distributed Computing. Citeseer.
[21] Staples, G. (2006). Torque resource manager. In Proceedings of the 2006 ACM/IEEE Conference on Supercomputing, SC '06, New York, NY, USA. ACM.
[22] Stone, J., Gohara, D., and Shi, G. (2010). OpenCL: A parallel programming standard for heterogeneous computing systems. Computing in science & engineering, 12.3:66.
[23] Teodoro, G., Sachetto, R., Sertel, O., Gurcan, M., Meira, W., Catalyurek, U., and Ferreira, R., editors (2009). Coordinating the use of GPU and CPU for improving performance of compute intensive applications. IEEE.
[24] Topcuoglu, H., Hariri, S., and Wu, M. (2002). Performance-effective and lowcomplexity task scheduling for heterogeneous computing. IEEE transactions on parallel and distributed systems, pages 260-274.
[25] Torvalds, L. (1999). The linux edge. Communications of the ACM, 42(4):38-39.
[26] Wesolowski, L. (2008). An application programming interface for general purpose graphics processing units in an asynchronous runtime system. Master's thesis, Dept. of Computer Science, University of Illionois.
[27] Wienke, S., Springer, P., Terboven, C., and an Mey, D. (2012). OpenACC - first experiences with real-world applications. Euro-Par 2012 Parallel Processing, pages 859-870.
[28] Zhou, K., Hou, Q., Ren, Z., Gong, M., Sun, X., and Guo, B. (2009). RenderAnts: interactive Reyes rendering on GPUs. In ACM Transactions on Graphics (TOG), volume 28, page 155. ACM.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-6d276cf3-34dd-40e9-aea0-915af6cd8670