Wyniki wyszukiwania - BazTech

1

Execution time prediction model for parallel GPU realizations of discrete transforms computation algorithms

Puchala Dariusz, Stokfiszewski Kamil, Wieloch Kamil

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2022

|

Vol. 70, nr 1

art. no. e139393

EN

Parallel realizations of discrete transforms (DTs) computation algorithms (DTCAs) performed on graphics processing units (GPUs) play a significant role in many modern data processing methods utilized in numerous areas of human activity. In this paper the authors propose a novel execution time prediction model, which allows for accurate and rapid estimation of execution times of various kinds of structurally different DTCAs performed on GPUs of distinct architectures, without the necessity of conducting the actual experiments on physical hardware. The model can serve as a guide for the system analyst in making the optimal choice of the GPU hardware solution for a given computational task involving particular DT calculation, or can help in choosing the best appropriate parallel implementation of the selected DT, given the limitations imposed by available hardware. Restricting the model to exhaustively adhere only to the key common features of DTCAs enables the authors to significantly simplify its structure, leading consequently to its design as a hybrid, analytically–simulational method, exploiting jointly the main advantages of both of the mentioned techniques, namely: time-effectiveness and high prediction accuracy, while, at the same time, causing mutual elimination of the major weaknesses of both of the specified approaches within the proposed solution. The model is validated experimentally on two structurally different parallel methods of discrete wavelet transform (DWT) computation, i.e. the direct convolutionbased and lattice structure-based schemes, by comparing its prediction results with the actual measurements taken for 6 different graphics cards, representing a fairly broad spectrum of GPUs compute architectures. Experimental results reveal the overall average execution time and prediction accuracy of the model to be at a level of 97.2%, with global maximum prediction error of 14.5%, recorded throughout all the conducted experiments, maintaining at the same time high average evaluation speed of 3.5 ms for single simulation duration. The results facilitate inferring the model generality and possibility of extrapolation to other DTCAs and different GPU architectures, which along with the proposed model straightforwardness, time-effectiveness and ease of practical application, makes it, in the authors’ opinion, a very interesting alternative to the related existing solutions.

2

Fast ray casting of function-based surfaces

Vyatkin S. I., Romanyuk A. N., Pavlov S. V., Moskovko M. V., Askarova N., Sagymbekova A., Wójcik W., Kotyra A.

Przegląd Elektrotechniczny

|

2017

|

R. 93, nr 5

83--86

EN

This paper deals with the fast ray casting of high-quality images, a method of defining free forms without approximating them with polygons or patches, issues of using perturbation functions for animation of the surfaces of 3D objects. A method for visualizing functionally defined objects adapted for graphics processing units (GPU) is proposed.

PL

W artykule zaprezentowano metodę rzutowania promieni wykorzystującą tzw. funkcję perturbacji zamiast aproksymacji za pomocą wieloboków w odniesieniu do obrazów wysokiej rozdzielczości w celu animacji powierzchni obiektów trójwymiarowych. Ponadto, zaproponowana została metoda wizualizacji obiektów z wykorzystaniem procesorów graficznych (GPU).

3

Parallel computation of transient processes on OpenCL framework

Cegielski M.

Przegląd Elektrotechniczny

|

2016

|

R. 92, nr 7

75--78

EN

Parallel execution of calculation of transient analysis is based on a split-level model into sub-systems, which in certain time increments are calculated independently of each other. Each such process has a high computational complexity. The process of implementing the calculation allows the use of parallel systems to calculations based on the use of the GPU, whose dynamic growth has been observed for several years. The article presents a brief description of parallel computing systems based on the OpenCL platform that uses GPUs. There is described the ability to implement the algorithm using this platform. There is also discussed, the timing to perform operations on GPU in relation to the calculations for classic CPU.

PL

Równoległa realizacja obliczeń analizy stanów przejściowych bazuje na podziale na poziomie modelu na pod-układy, które w określonych krokach czasowych obliczane są niezależnie od siebie. Każdy taki proces charakteryzuje się dużą złożonością obliczeniową. Proces realizacji obliczeń pozwala na zastosowanie do obliczeń systemów równoległych opartych o wykorzystanie GPU, których dynamiczny rozwój jest obserwowany od kilku lat. W artykule przedstawiono krótką charakterystykę równoległych systemów obliczeniowych opartych o platformę OpenCL wykorzystującą procesory GPU. Opisano możliwość implementacji algorytmu z wykorzystaniem tej platformy. Omówiono zależności czasowe realizacji obliczeń na procesorach graficznych w stosunku do obliczeń na klasycznych CPU.

4

Akceleracja metody elementów skończonych przy użyciu procesora graficznego

Dziekoński A., Lamęcki A., Mrozowski M.

Przegląd Elektrotechniczny

|

2016

|

R. 92, nr 9

12--15

PL

Artykuł przedstawia rezultaty akceleracji obliczeń metody elementów skończonych z użyciem procesora graficznego. Dzięki zastosowaniu masowo zrównoleglonych obliczeń na procesorze graficznym dwóch najbardziej kosztownych obliczeniowo etapów generacji macierzy współczynników i rozwiązywania układu równań przy użyciu metody gradientów sprzężonych z wielopoziomowym prekondycjonerem o schemacie V udało się pięciokrotnie skrócić czas symulacji metody elementów skończonych.

EN

This paper presents the results of the acceleration of computations involved in the finite element method obtained with graphics processors. A 5-fold acceleration was achieved thanks to the massive parallelization of two most time-consuming steps of the finite element method, namely matrix generation and the solution of sparse system of linear equations with the conjugate gradient method and a V-cycle multilevel preconditioner.

5

A Comparison of Methods for Calculation of Transformation Matrices for Model Order Reduction Using GPU Parallel Computing

Raczyński D.

Zeszyty Naukowe. Elektryka / Politechnika Opolska

|

2013

|

z. 69

91--92

EN

The purpose of this paper is to compare execution times for developed programs for determining transformation matrices in model order reduction using the balanced realization method. Six popular methods are implemented, named the: RPR, SR, BFSR, EIG-SR, EIGBFSR and Obinata-Anderson method. Each algorithm is prepared in two versions, one for execution on the main processor (CPU), and the other one for execution on graphics processor (GPU).

6

Teoretyczny potencjał technologii Nvidia CUDA a rzeczywista wydajność – pomiary wydajności w wybranych zastosowaniach obliczeniowych

Pala A.

Zeszyty Naukowe. Elektryka / Politechnika Opolska

|

2012

|

z. 67

83-88

PL

Niniejszy artykuł stanowi próbę oceny rzeczywistej wydajności układów graficznych opartych na technologii Nvidia CUDA w zastosowaniach obliczeniowych. Artykuł zawiera również szczegółową analizę wyników pomiarów czasów przykładowych obliczeń zrealizowanych w technologii Nvidia CUDA.

EN

This paper is an attempt to assess the actual performance of Nvidia CUDA based Graphic Processing Units (GPUs) in computing usage. The paper also contains a detailed analysis of the results of timing of sample calculations carried out in Nvidia CUDA technology.

7

Możliwość zastosowania obliczeń równoległych w elektroenergetyce

Drechny M.

Rynek Energii

|

2012

|

Nr 4

66--70

PL

Zagadnienia takie jak np. analiza wpływu przyłączenia nowych jednostek wytwórczych do systemu elektroenergetycznego, analiza pracy maszyn elektrycznych, złożone algorytmy automatyki zabezpieczeniowej czy prognozowanie produkcji energii elektrycznej za pomocą sieci neuronowych, do realizacji mogą wymagać zastosowania znacznych zasobów obliczeniowych. Zasoby te umożliwiają uzyskanie wyników symulacji (obliczeń) w możliwie krótkim czasie, przy czym nie zawsze jest to czas rzeczywisty. Zwiększenie szybkości wykonywanych obliczeń realizowane jest najczęściej przez zastosowanie odpowiednio zmodyfikowanych algorytmów obliczeniowych wykonywanych na klasycznych procesorach lub procesorach sygnałowych oraz przez zastosowanie obliczeń równoległych. Skupiając się na obliczeniach równoległych zauważyć można od kilku lat tendencję zastosowania procesorów graficznych, których budowa zoptymalizowana jest w kierunku wykonywania wielu takich samych operacji na różnych danych wejściowych (architektura SIMT - ang. Single Instruction Multiple Thread). W pracy przedstawiono przykłady użycia w elektroenergetyce obliczeń równoległych wykorzystujących do tego celu procesory graficzne. Scharakteryzowano możliwości zastosowania do obliczeń tego typu procesorów oraz przedstawiono prosty przykład porównujący szybkość obliczeń z użyciem klasycznego procesora i procesora graficznego.

EN

Issues such as impact of the analysis of the connection of new generation units to power system operation analysis of electrical machines, complex algorithms protective automation and power production forecasting using artificial neural networks to be implemented may require significant computational resources. These resources allow you to obtain the simulation results (calculations) in the shortest possible time, but it is not always the real time. In-creasing the speed of the calculation is carried out mostly by the use of suitably modified algorithms performed on classical computational processors and signal processors, and by the use of parallel computing. Focusing on the paral-lel computing, for the last several years can be seen the tendency to use graphics processors, which structure is opti-mized towards the performance of many of the same operations on different input data (SIMT architecture - Single Instruction Multiple Thread). In the paper are presented examples of application in the power engineering parallel computing which make use for this purpose graphics processors, characterized the applicability of the calculations of this type of processors and showed a simple example the speed of calculations by the use of classical CPU and GPU.

8

Zarządzanie pamięcią i blokami wątków w obliczeniach równoległych z użyciem architektury CUDA

Widuch J.

Studia Informatica

|

2010

|

Vol. 31, nr 4A

75-96

PL

Dzięki upowszechnieniu się procesorów wielordzeniowych przetwarzanie danych za pomocą obliczeń równoległych staje się coraz bardziej dostępne dla szerokiego grona użytkowników. Przykładem jest opracowana przez firmę NVIDIA architektura CUDA, będąca architekturą wielordzeniowych procesorów graficznych. Procesor graficzny może być traktowany jako procesor SIMD z pamięcią wspólną. Na przykładzie operacji mnożenia macierzy zbadano wpływ zarządzania pamięcią i blokami wątków na czas obliczeń z użyciem architektury CUDA.

EN

With the propagation of a multi-core processors a parallel data processing becomes more accessible to a wide range of users. An example is CUDA architecture developed by NVIDIA, which is a multi-core GPU architecture. The GPU can be treated as a SIMD processor with shared memory. The influence of memory management and blocks of threads management on time of computation using CUDA architecture was researched on the basis of matrix multiplication.