Wyniki wyszukiwania - BazTech

1

Performance enhancement of CUDA applications by overlapping data transfer and Kernel execution

Raju K., Chiplunkar Niranjan N

Applied Computer Science

|

2021

|

Vol. 17, no 3

5--18

EN

The CPU-GPU combination is a widely used heterogeneous computing system in which the CPU and GPU have different address spaces. Since the GPU cannot directly access the CPU memory, prior to invoking the GPU function the input data must be available on the GPU memory. On completion of GPU function, the results of computation are transferred to CPU memory. The CPU-GPU data transfer happens through PCIExpress bus. The PCI-E bandwidth is much lesser than that of GPU memory. The speed at which the data is transferred is limited by the PCI-E bandwidth. Hence, the PCI-E acts as a performance bottleneck. In this paper two approaches are discussed to minimize the overhead of data transfer, namely, performing the data transfer while the GPU function is being executed and reducing the amount of data to be transferred to GPU. The effectiveness of these approaches on the execution time of a set of CUDA applications is realized using CUDA streams. The results of our experiments show that the execution time of applications can be minimized with the proposed approaches.

2

Machine Learning and High-Performance Computing Hybrid Systems, a New Way of Performance Acceleration in Engineering and Scientific Applications

Gepner Pawel

Annals of Computer Science and Information Systems

|

2021

|

Vol. 25

27--36

EN

Machine learning is one of the hottest topics in IT industry as well as in academia. Some of the IT leaders and scientists believe that this is going to totally revolutionise the industry. This transformation is happening on both fronts, one is the application and software paradigm, the other is at the hardware and system level. At the same time, the High-Performance Computing segment is striving to achieve the level of Exascale performance. It is not debatable that to meet such level of performance and keep the cost of system and power consumption on reasonable level is not a trivial task. In this article, we try to look at a potential solution to these problems and discuss a new approach to building systems and software to meet these challenges and the growing needs of the computing power for HPC systems on the one hand, but also be ready for a new type of workload including Artificial Intelligence type of applications.

3

A GIS based graph oriented algorithmic model for poly-optimization of waste management system

Gaska K., Generowicz A., Zimoch I., Ciuła J., Siedlarz D.

Architecture Civil Engineering Environment

|

2018

|

Vol. 11, no. 4

151--159

EN

The article presents an integrated (inferential) system of computer assistance in waste management designed in componentbased technology. The system allows for the implementation of individual elements (system components) with native and managed programming languages and performance technologies, ensuring easy integration of those components into one coherent, cooperating whole. One of the key issues involves the placement of the objects, events and conducted spatial (geographical) analyses in the system through the application of GIS technology (ability to use digital (vector or halftone-based) terrain maps), execution of spatial analyses, data visualization on maps, etc., using also commonly available spatial data available as part of the Infrastructure for Spatial Information (established under the Act on Infrastructure for Spatial Information).

4

High-performance simulation-based algorithms for an alpine ski racer’s trajectory optimization in heterogeneous computer systems

Dębski R.

International Journal of Applied Mathematics and Computer Science

|

2014

|

Vol. 24, no. 3

551--566

EN

Effective, simulation-based trajectory optimization algorithms adapted to heterogeneous computers are studied with reference to the problem taken from alpine ski racing (the presented solution is probably the most general one published so far). The key idea behind these algorithms is to use a grid-based discretization scheme to transform the continuous optimization problem into a search problem over a specially constructed finite graph, and then to apply dynamic programming to find an approximation of the global solution. In the analyzed example it is the minimum-time ski line, represented as a piecewise-linear function (a method of elimination of unfeasible solutions is proposed). Serial and parallel versions of the basic optimization algorithm are presented in detail (pseudo-code, time and memory complexity). Possible extensions of the basic algorithm are also described. The implementation of these algorithms is based on OpenCL. The included experimental results show that contemporary heterogeneous computers can be treated as μ-HPC platforms—they offer high performance (the best speedup was equal to 128) while remaining energy and cost efficient (which is crucial in embedded systems, e.g., trajectory planners of autonomous robots). The presented algorithms can be applied to many trajectory optimization problems, including those having a black-box represented performance measure.

5

Parallelization of the Levenshtein distance algorithm

Niewiarowski A., Stanuszek M.

Czasopismo Techniczne. Nauki Podstawowe

|

2014

|

R. 111, z. 3-NP

109--122

EN

This paper presents a method for the parallelization of the Levenshtein distance algorithm deployed on very large strings. The proposed approach was accomplished using .NET Framework 4.0 technology with a specific implementation of threads using the System. Threading.Task namespace library. The algorithms developed in this study were tested on a high performance machine using Xamarin Mono (for Linux RedHat/Fedora OS). The computational results demonstrate a high level of efficiency of the proposed parallelization procedure.

PL

Artykuł przedstawia metodę zrównoleglenia algorytmu analizy odległości edycyjnej Levenshteina dedykowaną bardzo dużym ciągom tekstowym. Zaproponowane rozwiązanie zostało zaimplementowane na platformie .NET Framework 4.0 z uwzględnieniem metod dostępnych w przestrzeni nazw System.Threading.Task. Zastosowane algorytmy przetestowano na komputerze wysokiej wydajności, w oparciu o narzędzia Xamarin Mono (dla SO Linux RedHat/ Fedora). Otrzymane wyniki pokazują znacząco zwiększoną wydajność obliczeń dla przedstawionych w artykule rozwiązań.

6

A High Performance Computing approach to the simulation of Fluid-Solid Interaction problems with rigid and flexible components

Pazouki A, Serban R, Negrut D

Archive of Mechanical Engineering

|

2014

|

Vol. LXI, nr 2

227--251

EN

W pracy przedstawiono zarys jednolitego podejścia do bezpośredniej numerycznej symulacji problemów interakcji płyn – ciało stałe (FSI) z wykorzystaniem wielowątkowej wysokowydajnej techniki obliczeniowej (HPC) o wielkiej skali. Algorytm symulacji opiera się na rozszerzonej metodzie hydrodynamiki cząstek gładkich (XSPH), która opisuje przepływ płynu w formalizmie Lagrange'a zgodnym z metodą Lagrange'a śledzenia fazy stałej. W celu modelowania sztywnego i elastycznego układu wielu ciał implementowano ogólną, trójwymiarową dynamikę ciała sztywnego i zastosowano sformułowanie bezwzględnych współrzędnych węzłowych (ANCF). Dwukierunkowe sprzężenie między płynem i fazą stałą jest zamodelowane przez użycie znaczników wymuszenia warunków brzegowych (BCE) które oddają działanie sił sprzężenia między płynem a ciałem stałym wymuszając brak poślizgu w warunkach brzegowych. Problem interakcji bliskiego zakresu między płynem i ciałem stałym, która ma decydujący wpływ na zachowanie w małej skali mieszanin płynów i ciał stałych, rozwiązano przy pomocy modelu sił smarowania. Stany systemu zbiorczego są integrowane w czasie przy użyciu jawnego, wieloszybkościowego schematu. By zmniejszyć wielkie obciążenie obliczeniowe, w algorytmie ogólnym położono nacisk na obliczenia równoległe w kartach procesorów graficznych (GPU). W pracy przedstawiono analizę wydajności i skalowania dla scenariuszy symulacji obejmujących jedną lub wiele faz przy liczbie obiektów stałych sięgającej dziesiątek tysięcy. Implementacja oprogramowania przedstawionej metody, o nazwie Chrono: Fluid, jest częścią projektu Chrono i jest udostępniona do użytku nieodpłatnego.

7

Comparistion between computer clouds and local clusters in CFD software application

Janik Ł., Barnes S.

Studia Informatica

|

2012

|

Vol. 33, nr 4

5--23

EN

Work is a comparison of HPC clusters and computer cloud considering problem size and communication pattern. The paper addresses aspects like performance, costs, scalability, elasticity, reliability, and resource utilization. Comparison has been made indicating application of CFD (Computational Fluid Dynamics) simulations

PL

Praca jest porównaniem wydajności klastrów wysokowydajnościowych i chmur komputerowych biorąc pod uwagę natomiast problem oraz sposób komunikacji. Publikacja porusza aspekty takie jak wydajność, koszty, skalowalność, elastyczność, niezawodność i efektywność wykorzystania zasobów. Porównanie zostało wykonane pod kątem zapotrzebowania na symulacje CFD (Computational Fluid Dynamics).

8

Wyznaczanie równoległości pętli programowych w aplikacjach dedykowanych dla procesorów graficznych

Bielecki W., Pałkowski M.

Pomiary Automatyka Kontrola

|

2011

|

R. 57, nr 8

963-965

PL

Ekstrakcja równoległości w postaci niezależnych fragmentów kodu pozwala wygenerować równoległe pętle programowe w sposób automatyczny. Kod taki umożliwia wykorzystanie mocy obliczeniowej maszyn równoległych, w tym wieloprocesorowych kart graficznych. W niniejszym artykule poddano analizie zastosowanie algorytmów wyznaczania fragmentów kodu dla aplikacji dedykowanych dla procesorów graficznych. Zbadano przyspieszenie i efektywność obliczeń oraz skalowalność wygenerowanego kodu równoległego.

EN

Extracting synchronization-free slices allows automatically generating parallel loops. The code can be executed on multi-processors machines in a reduced period of time. Slicing techniques enable also generating parallel code for graphics processing in general purpose computing. Nowadays, graphic cards support executing multi-threaded applications. GPU systems consist of tens or hundreds of processors. CUDA (an acronym for Compute Unified Device Architecture) is a parallel computing architecture developed by NVIDIA. Graphics processing units (GPUs) are accessible to software developers through variants of industry standard programming languages. Using CUDA, the latest NVIDIA GPUs become accessible for computation like CPUs. The model for GPU computing is to use a CPU and GPU together in a heterogeneous co-processing computing model. The sequential part of the application runs on the CPU and the computationally-intensive part is accelerated by the GPU. From the user's perspective, the application just runs faster because it uses the high-performance of the GPU to boost performance. In this paper slicing algorithms are examined for generating a parallel code for graphic cards are examined. A short example of the code is presented. CUDA statements and technique are explained. Memory cost and transfer data is considered. Speed-up, efficiency and scalability of the code are analyzed.

9

Towards a grid infrastructure for hydro-meteorological research

Schiffers M., Kranzlmuller D., Clematis A., D'Agostino D., Galizia A., Quarati A., Parodi A., Morando M., Rebora N., Trasforini E., Molini L., Siccardi F., Craig G., Tafferner A.

Computer Science

|

2011

|

Vol. 12

45-62

EN

The Distributed Research Infrastructure for Hydro-Meteorological Study (DRIHMS) is a co-ordinated action co-funded by the European Commission. DRIHMS analyzes the main issues that arise when designing and setting up a pan-European Grid-based e-Infrastructure for research activities in the hydrologic and meteorological fields. The main outcome of the project is represented first by a set of Grid usage patterns to support innovative hydro-meteorological research activities, and second by the implications that such patterns define for a dedicated Grid infrastructure and the respective Grid architecture.

PL

Rozproszona infrastruktura naukowa przeznaczona do badań hydrometeorologicznych (Distributed Research Infrastructure for Hydro-Meteorological Study - DRIHMS) stanowi element skoordynowanej akcji współfinansowanej przez Komisję Europejską. Celem DRIHMS jest analiza głównych problemów spotykanych w dziedzinie hydrologii i meteorologii. Głównym wynikiem projektu będzie zestaw wzorców użytkowania środowisk gridowych w celu wspomagania nowoczesnych badań hydrometeorologicznych oraz wnioski wynikające z powyższego zastosowania, mogące mieć wpływ na dalszy rozwój dedykowanych rozwiązań gridowych.

10

Evaluating new architectural features of the Intel(r) Xeon(r) 7500 Processor for hpc workloads

Gepner P., Fraser D.L., Kowalik M.F., Waćkowski K.

Computer Science

|

2011

|

Vol. 12

5-17

EN

In this paper we take a look at what the Intel Xeon Processor 7500 family, code named Nehalem-EX, brings to high performance computing. We compare two families of Intel Xeon based systems (Intel Xeon 7500 and Intel Xeon 5600) and present a performance evolution of 16 node clusters based on these CPUs. We compare CPU generations utilizing dual socket platforms and a cluster across a number of HPC benchmarks and focused on different performance field and aspect. We will evaluate also technologies and features like Intels Hyper Threading Technology (HT) and Intel Turbo Boost Technology (Turbo Mode) and the performance implication of these technologies for HPC.

PL

W artykule przedstawiamy możliwości procesorów z rodziny Intel Xeon 7500 w obliczeniach wysokiej wydajności. Porównaniu poddano dwa 16-węzłowe klastry oparte na rodzinach procesorów Intel Xeon (7500 i 5600). Eksperyment przeprowadzono na klastrach zbudowanych w oparciu o platformę sprzętową wyposażoną w dwa gniazda procesorowe, wykorzystując popularne benchmarki z dziedziny HPC, koncentrując się na różnych aspektach wydajności. Przedstawiono również wpływ technologii Intel Hyper Threading oraz Intel Turbo Boost Technology na wydajność obliczeń.