Wyniki wyszukiwania - BazTech

1

Przekształtnik AC/DC/AC/DC do naziemnego zasilania statków powietrznych

Kulikowski Krzysztof, Falkowski Piotr, Sikorski Andrzej, Wasilewski Mateusz, Kuźma Adam, Dmitruk Krzysztof, Godlewska Agata, Nowaszewski Krzysztof, Jakubowski Hubert, Stępień Grzegorz

Przegląd Elektrotechniczny

|

2023

|

R. 99, nr 5

225--230

PL

W artykule przedstawiono zasilacz statków powietrznych zbudowany z przekształtników energoelektronicznych AC/DC oraz DC/AC/DC z separacją transformatorową. Zasilacz może być zasilany z dwóch standardów napięcia 400V/50Hz oraz 200V/400Hz, a na wyjściu uzyskuje się separowane napięcia 2x28V DC. Przedstawiono badania laboratoryjne zasilacza obejmujące stany pracy statycznej i dynamicznej, a także THD prądów wejściowych i sprawność energetyczna urządzenia.

EN

The article presents an aircraft power supply made of AC / DC and DC / AC / DC power converters with transformer separation. The PSU can be powered from two voltage standards 400V / 50Hz and 200V / 400Hz, and the output has separate voltages of 2x28V DC. Laboratory tests of the power supply are presented, including static and dynamic operating states, as well as THD of input currents and device efficiency.

2

GPU implementation of atomic fluid MD simulation

Dawid Aleksander

TASK Quarterly : scientific bulletin of Academic Computer Centre in Gdansk

|

2022

|

Vol. 26, No 1

25--37

EN

A computer simulation of an atomic fluid on a GPU was implemented using the CUDA architecture. It was shown that the programming model for efficient numerical computing applications was changing with the development of the CUDA architecture. The introduction of the L2 cache decreased the latency between the global GPU memory and the registers. The performed MD simulation using the global memory and registers showed that the average acceleration relative to the CPU reached 80 times for single-precision calculations. Usually, the shared block memory gives much be4er results for this kind of calculation. We have found that using the shared memory gives acceleration over 116 times in comparison to the CPU. It is about 49% faster than using the global memory and registers. It is shown here that the performance of generally available graphics cards for double-precision calculations is significantly lower than for single-precision calculations. The recorded double-precision acceleration relative to the CPU in our experiment averaged 6 and 7 times for the global and shared memory, respectively. We performed these calculations on two different CUDA enable device systems.

3

Execution time prediction model for parallel GPU realizations of discrete transforms computation algorithms

Puchala Dariusz, Stokfiszewski Kamil, Wieloch Kamil

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2022

|

Vol. 70, nr 1

art. no. e139393

EN

Parallel realizations of discrete transforms (DTs) computation algorithms (DTCAs) performed on graphics processing units (GPUs) play a significant role in many modern data processing methods utilized in numerous areas of human activity. In this paper the authors propose a novel execution time prediction model, which allows for accurate and rapid estimation of execution times of various kinds of structurally different DTCAs performed on GPUs of distinct architectures, without the necessity of conducting the actual experiments on physical hardware. The model can serve as a guide for the system analyst in making the optimal choice of the GPU hardware solution for a given computational task involving particular DT calculation, or can help in choosing the best appropriate parallel implementation of the selected DT, given the limitations imposed by available hardware. Restricting the model to exhaustively adhere only to the key common features of DTCAs enables the authors to significantly simplify its structure, leading consequently to its design as a hybrid, analytically–simulational method, exploiting jointly the main advantages of both of the mentioned techniques, namely: time-effectiveness and high prediction accuracy, while, at the same time, causing mutual elimination of the major weaknesses of both of the specified approaches within the proposed solution. The model is validated experimentally on two structurally different parallel methods of discrete wavelet transform (DWT) computation, i.e. the direct convolutionbased and lattice structure-based schemes, by comparing its prediction results with the actual measurements taken for 6 different graphics cards, representing a fairly broad spectrum of GPUs compute architectures. Experimental results reveal the overall average execution time and prediction accuracy of the model to be at a level of 97.2%, with global maximum prediction error of 14.5%, recorded throughout all the conducted experiments, maintaining at the same time high average evaluation speed of 3.5 ms for single simulation duration. The results facilitate inferring the model generality and possibility of extrapolation to other DTCAs and different GPU architectures, which along with the proposed model straightforwardness, time-effectiveness and ease of practical application, makes it, in the authors’ opinion, a very interesting alternative to the related existing solutions.

4

A Novel FE/MC-based Mathematical Model of Mushy Steel Deformation with GPU Support

Hojny Marcin, Dębiński Tomasz

Archives of Metallurgy and Materials

|

2022

|

Vol. 67, iss. 2

735--742

EN

The paper presents the results of work leading to the construction of a spatial hybrid model based on finite element (FE) and Monte Carlo (MC) methods allowing the computer simulation of physical phenomena accompanying the steel sample testing at temperatures that are characteristic for soft-reduction process. The proposed solution includes local density variations at the level of mechanical solution (the incompressibility condition was replaced with the condition of mass conservation), and at the same time simulates the grain growth in a comprehensive resistance heating process combined with a local remelting followed by free/controlled cooling of the sample tested. Simulation of grain growth in the entire computing domain would not be possible without the support of GPU processors. There was a 59-fold increase in the computing speed on the GPU compared to single-threaded computing on the CPU. The study was complemented by examples of experimental and computer simulation results, showing the correctness of the adopted model assumptions.

5

Fenomen sowieckich służb specjalnych

Świerczek Marek

Wiedza Obronna

|

2020

|

nr 2

63--74

PL

Autor artykułu, analizując fenomen skuteczności sowieckich służb specjalnych, wysuwa hipotezę, że była ona skutkiem splotu kilku czynników. Wśród głównych elementów składających się na niebywałą sprawność służb sowieckich w pierwszej dekadzie po przewrocie październikowym, zdaniem autora znajdują się: fuzja doświadczeń instytucjonalnych Ochrany z wiedzą środowisk kryminalno-rewolucyjnych tworzących WCzK oraz rozwiązania operacyjne niespotykane wśród cywilizowanych społeczeństw. Zdaniem autora, sowieckie służby zostały de facto wyjęte spod prawa i nakazów moralności (uznawanych za przeżytek burżuazyjny), co umożliwiło stosowanie metod i form pracy niemożliwych w Europie Zachodniej. Tak rozumiane instrumentarium sowieckich służb składało się z mieszanki terroru, infiltracji, prowokacji i dezinformacji.

EN

The article author, analyzing the phenomenon of the effectiveness of the Soviet special services, puts forward the hypothesis that it was the result of a combination of several factors. Among the main elements that made up the incredible efficiency of the Soviet services in the first decade after the October coup, according to the author there were: a fusion of institutional experiences of the Okhrana with the knowledge of criminal-revolutionary environments creating the VChK and operational solutions unheard of in civilized societies. According to the author, the Soviet services acted under no law and moral obligations (considered by revolutionaries as bourgeois remnant), which allowed the use of methods and forms of operational work unthinkable in Western Europe. The instruments of the Soviet services consisted of a mixture of terror, infiltration, provocation and disinformation.

6

Implementation of numerical integrationto high-order elements on the GPUs

Krużel Filip, Banaś Krzysztof, Nytko Mateusz

Computer Assisted Methods in Engineering and Science

|

2020

|

Vol. 27, no. 1

3--26

EN

This article presents ways to implement a resource-consuming algorithm on hardware with a limited amount of memory, which is the GPU. Numerical integration for higher-order finite element approximation was chosen as an example algorithm. To perform compu- tational tests, we use a non-linear geometric element and solve the convection-diffusion- reaction problem. For calculations, a Tesla K20m graphics card based on Kepler archi- tecture and Radeon r9 280X based on Tahiti XT architecture were used. The results of computational experiments were compared with the theoretical performance of both GPUs, which allowed an assessment of actual performance. Our research gives sugges- tions for choosing the optimal design of algorithms as well as the right hardware for such a resource-demanding task.

7

Wykorzystanie GPGPU do obliczeń ekspozycji ludności na narażenie pola elektrycznego

Wroński Jacek W., Rzeźniczak Krzysztof, Michalski Igor

Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne

|

2019

|

nr 6

291--294, CD

PL

W niniejszym artykule przedstawiono metodę wykorzystania procesorów graficznych do obliczeń wartości poziomów niejonizujących pól elektromagnetycznych, pochodzących od systemów radiokomunikacyjnych, stanowiących potencjalne źródło narażeń ludności na pole elektromagnetyczne. Czasy obliczeń porównano z metodami wykorzystującymi przetwarzanie równoległe na procesorach CPU.

EN

This article presents the method of using GPGPU to estimate EMF levels of human exposure on non-ionized EMF, deriving from wireless systems. Calculation time on GPGPU has been compared to time elapsed with parallel calculations performed on CPU.

8

Wykorzystanie CPU i GPU do obliczeń w Matlabie

Woźniak Jarosław

Journal of Computer Sciences Institute

|

2019

|

Vol. 10

32--35

PL

W artykule zostały przedstawione wybrane rozwiązania wykorzystujące procesory CPU oraz procesory graficzne GPU do obliczeń w środowisku Matlab. Porównywano różne metody wykonywania obliczeń na CPU, jak i na GPU. Zostały wskazane różnice, wady, zalety oraz skutki stosowania wybranych sposobów obliczeń.

EN

The article presents selected solutions using CPU processors and GPUs for calculations in the Matlab environment. Various methods of performing calculations on the CPU as well as on the GPU were compared. Differences, disadvantages, advantages and effects of using selected calculation methods have been indicated.

9

Handling Non-determinism in Spiking Neural P Systems : Algorithms and Simulations

Carandang Jym Paul, Cabarle Francis George C., Adorna Henry Natividad, Hernandez Nestine Hope S., Martínez-del-Amor Miguel Ángel

Fundamenta Informaticae

|

2019

|

Vol. 164, nr 2-3

139--155

EN

Spiking Neural P system is a computing model inspired on how the neurons in a living being are interconnected and exchange information. As a model in embrane computing, it is a non-deterministic and massively-parallel system. The latter makes GPU a good candidate for accelerating the simulation of these models. A matrix representation for systems with and without delay have been previously designed, and algorithms for simulating them with deterministic systems was also developed. So far, non-determinism has been problematic for the design of parallel simulators. In this work, an algorithm for simulating non-deterministic spiking neural P system with delays is presented. In order to study how the simulations get accelerated on a GPU, this algorithm was implemented in CUDA and used to simulate non-uniform and uniform solutions to the Subset Sum problem as a case study. The analysis is completed with a comparison of time and space resources in the GPU of such simulations.

10

Równoległa realizacja przykładowego algorytmu genetycznego z wykorzystaniem akceleratorów GPU

Ratuszniak P., Stasiak A., Łańcucki R.

Zeszyty Naukowe Wydziału Elektroniki i Informatyki Politechniki Koszalińskiej

|

2018

|

Nr 13

63--78

PL

W artykule zaprezentowano praktyczną implementację aplikacji rozwiązującej przykładowy algorytm genetyczny z wykorzystaniem akceleratorów GPU. W tym przypadku zdecydowano się na rozwiązanie za pomocą algorytmu genetycznego typowego problemu optymalizacyjnego, jakim jest problem komiwojażera. Dodatkowo w celu wykorzystania mocy karty graficznej w tworzonej aplikacji wykorzystano technologię programowania na karcie graficznej – technologię Nvidia CUDA.

EN

The paper presents a practical implementation of a local desktop application that solves exemplary genetic algorithm with the use of GPU accelerators. In this case decided with the use of genetic algorithm to solve typical optimization problem which is travelling salesman problem. Additionally used Nvidia CUDA programming technology in order to use power of GPU in created application.

11

Electrical supply of aircraft during parking

Fiszer R., Matuszczak Z., Jakubowski H., Jaskowiak S.

Journal of KONBiN

|

2018

|

No. 48

359--370

EN

The elaboration discusses selected aspects of parking supply of aircraft with electricity generated by mobile, airfield sources. In terms of sublime electrical and electronic systems, as well as avionic systems installed on-board of contemporary aircraft, the quality of electricity supplied by ground sources results in their manufacturers facing with high requirements that are contained in relevant standards and regulations. The quality of electricity generated by ground sources and their compatibility have a direct impact, among others, on the calibration of aircraft avionics systems during the ground flight preparation, which directly contributes to the safety of air operations. Therefore, the possibility of constant real-time monitoring of the supplied electricity (specific parameters) enabling immediate identification, recording, adjusting the deviations, hence, preventing damage or improper preparation of an aircraft for flight, becomes a non-trivial issue.

PL

W niniejszym opracowaniu omówiono wybrane aspekty elektrycznego zasilania postojowego statku powietrznego przy użyciu źródeł przenośnych i lotniskowych. Pod względem zaawansowanych systemów elektrycznych i elektronicznych, jak również systemów awionicznych zainstalowanych na pokładzie współczesnego statku powietrznego, jakość elektryczności dostarczanej przez źródła naziemne powoduje, że producenci borykają się z problemem wysokich wymogów zawartych w odpowiednich normach i regulacjach. Jakość elektryczności generowanej przez źródła naziemne oraz ich kompatybilność mają bezpośredni wpływ między innymi na kalibrację systemów awionicznych statku powietrznego podczas przygotowania do lotu, co bezpośrednio przekłada się na bezpieczeństwo operacji powietrznych. Dlatego, możliwość ciągłego monitorowania w czasie rzeczywistym dostarczanej elektryczności (określone parametry) umożliwiającej natychmiastową identyfikację, rejestrację, dostosowywanie odchyleń a co za tym idzie, zapobieganie powstawaniu uszkodzeń oraz nieprawidłowemu przygotowaniu statku powietrznego do lotu, jest kwestią niezwykle istotną.

12

Robust and efficient finite-difference-time-domain modelling of the propagation of nonlinear elastic waves

Pandala A., Shivaprasad S., Krishnamurthy C. V., Balasubramaniam K.

Badania Nieniszczące i Diagnostyka

|

2018

|

nr 2

11--21

EN

A robust finite-difference-time-domain (FDTD ) scheme to model the non-linear elastic wave propagation in a homogeneous isotropic material is presented. A formulation based on rotated staggered grid scheme in a displacement-velocity-stress configuration incorporating both geometric and material nonlinearities is proposed. By adopting a Parsimonious algorithm, the computational memory requirement is reduced by 50%. Simulations are accelerated by exploiting massive data parallelism innate to the FDTD approach using parallel computation on Graphical Processing Units with NVIDIA CUDA ’s API. For the proposed numerical scheme, the grid convergence criterion and accuracy over propagating distances are investigated. The study is also extended to determine the contribution from geometric and material models at various input amplitude levels. The time and frequency domain signals obtained from the proposed scheme are verified with a commercial finite element solver. The simulation runtimes for an Aluminium sample of dimensions 20 mm x 10 mm using a 5 MHz pulse is of the order of one minute, which makes the proposed numerical scheme attractive to model nonlinear elastic waves in large domains.

PL

W artykule przedstawiono odporny schemat metody różnic skończonych w dziedzinie czasu (FDTD ) do modelowania propagacji nieliniowych fal sprężystych w jednorodnym materiale izotropowym. Zaproponowano podejście oparte na rotowanych siatkach przestawnych w układzie przemieszczenie- prędkość-naprężenie obejmującym zarówno nieliniowość geometryczną, jak i materiałową. Zastosowanie algorytmu redukcji oszczędnej, zmniejszyło zapotrzebowanie na pamięć obliczeniową o 50%. Symulacje są przyspieszane przez wykorzystanie olbrzymiego paralelizmu danych wbudowanego w podejście FDTD z wykorzystaniem obliczeń równoległych na jednostkach przetwarzania graficznego (GPU) wyposażonych w interfejs API NVIDIA CUDA . Dla proponowanego schematu numerycznego badane jest kryterium zbieżności siatki i dokładność w funkcji odległości propagacji. Badanie rozszerzono również w celu określenia wkładu modeli geometrycznych i materiałowych na różnych poziomach amplitudy wejściowej. Sygnały w dziedzinie czasu i częstotliwości uzyskane z proponowanego schematu są weryfikowane za pomocą komercyjnego oprogramowania wykorzystującego metodę elementów skończonych. Czasy pracy dla symulacji propagacji impulsu o częstotliwości 5 MHz w próbce aluminium o wymiarach 20 mm x 10 mm są rzędu jednej minuty, co sprawia, że proponowany schemat liczbowy jest atrakcyjny dla modelowania nieliniowych fal sprężystych w dużych domenach.

13

Demonstrator przenośnego systemu Phased-Array z funkcją Full-Matrix Capture

Lewandowski M., Walczak M., Witek B., Rozbicki J., Steifer T.

Badania Nieniszczące i Diagnostyka

|

2018

|

nr 3

70--71

PL

Ultradźwiękowe systemy Phased-Array (PA) umożliwiają detekcję i ocenę wad za pomocą wieloelementowych głowic ze skanowaniem elektronicznym. Zaawansowane metody kierowania wiązki oraz wizualizacji znacznie ułatwiają badania obiektów o skomplikowanej geometrii. Należy jednak pamiętać, że klasyczna metoda PA bazuje na tych samych zasadach fizycznych, co skanowanie standardowymi głowicami jednoelementowymi i posiada te same ograniczenia. W naszym laboratorium pracujemy nad implementacją nowej klasy metod obrazowania UT, które wykorzystują technikę Full-Matrix Capture (FMC) oraz Total Focusing Method (TFM). Metody te dają zupełnie nowe możliwości rekonstrukcji obrazów wad i pozwalają na uzyskanie jednorodnej rozdzielczości poprzecznej w całej głębokości badania. W tym celu zbudowaliśmy demonstrator przenośnego systemu PA wyposażony w funkcje FMC i TFM. Akwizycja pełnej macierzy ech oraz przetwarzanie softwarowe na wbudowanym procesorze GPU (Nvidia® Tegra) zapewniają duże możliwości przetwarzania i analizy sygnałów. Demonstrator jest wyposażony w 32-kanały akwizycji w konfiguracji 32:128 i współpracuje ze standardowymi głowicami PA firmy Olympus®.

EN

Phased-Array (PA) ultrasonic systems enable the detection and evaluation of defects with multi-element electronic scanning heads. Advanced beam steering and visualization make it easy to explore complex geometries. However, it should be remembered that the classic PA method is based on the same physical principles as standard single-element probes and has the very same limitations. In our laboratory we are working on the implementation of a new class of UT imaging methods, namely Full-Matrix Capture (FMC) and Total Focusing Method (TFM) techniques. These methods provide completely new possibilities for the reconstruction of defect images and allow to obtain a uniform lateral resolution throughout the depth of the test. For this purpose, we have built a portable PA system demonstrator equipped with FMC and TFM functions. Acquisition of a full array of echoes and software processing on the built-in GPU (Nvidia® Tegra) provide great opportunities for signal processing and analysis. The demonstrator is equipped with 32 RX channels in a 32:128 configuration and is compatible with standard Olympus® PA probes.

14

Sequential Classification of Palm Gestures Based on A* Algorithm and MLP Neural Network for Quadrocopter Control

Wodziński M., Krzyżanowska A.

Metrology and Measurement Systems

|

2017

|

Vol. 24, nr 2

265--276

EN

This paper presents an alternative approach to the sequential data classification, based on traditional machine learning algorithms (neural networks, principal component analysis, multivariate Gaussian anomaly detector) and finding the shortest path in a directed acyclic graph, using A* algorithm with a regression-based heuristic. Palm gestures were used as an example of the sequential data and a quadrocopter was the controlled object. The study includes creation of a conceptual model and practical construction of a system using the GPU to ensure the realtime operation. The results present the classification accuracy of chosen gestures and comparison of the computation time between the CPU- and GPU-based solutions.

15

Assessment of various GPU acceleration strategies in text categorization processing flow

Korduła Ł., Wielgosz M., Karwatowski M., Pietroń M., Żurek D., Wiatr K.

Measurement Automation Monitoring

|

2017

|

Vol. 63, No. 6

203--205

EN

Automatic text categorization presents many difficulties. Modern algorithms are getting better in extracting meaningful information from human language. However, they often significantly increase complexity of computations. This increased demand for computational capabilities can be facilitated by the usage of hardware accelerators like general purpose graphic cards. In this paper we present a full processing flow for document categorization system. Gram-Schmidt process signatures calculation up to 12 fold decrease in computing time of system components.

16

Komputerowe symulacje procesów fizycznych z zastosowaniem heterogenicznych układów wielordzeniowych

Michalski G.

Studia i Materiały / Europejska Uczelnia Informatyczno-Ekonomiczna w Warszawie

|

2016

|

Nr 1(11)

13--21

PL

Problemy, przed jakimi stają współcześni inżynierowie, wymagają bardzo często przeprowadzenia złożonych symulacji komputerowych rozważanego zjawiska. W zdecydowanej większości takich symulacji wyznaczane są rozkłady różnych wielkości fizycznych, takich jak temperatura, odkształcenia, czy przemieszczenia. Ze względu na dużą złożoność tego rodzaju zadań realizowanie ich na zwykłych procesorach ogólnego przeznaczenia staje się nieefektywne. Coraz częściej inżynierowie sięgają po nowoczesne heterogeniczne układy wielordzeniowe takie jak układy graficzne. Zastosowanie tych rozwiązań sprzętowych pozwala na znaczące przyspieszenie obliczeń. W pracy autor przedstawił komputerową symulację procesu krzepnięcia odlewu w formie odlewniczej z zastosowaniem układów graficznych nVidia zgodnych z architekturą CUDA.

EN

Issues today's faced by engineers require's very often perform complex computer simulations the considered phenomenon. In the great majority of these computer simulations are calculated distributions of various of physical quantities such as temperature, deformations, and displacements. Due to a large complexity of these tasks use the general purpose processors becomes ineffective. More often engineers are reach for the modern many-core heterogeneous systems such as GPUs. Use of these hardware solutions can significantly speed up the computations.In this work the author presents a computer simulation of casting solidification process in the mold using nVidia chipset compatible with the CUDA architecture.

17

Implementacja metody momentów w heterogenicznym środowisku obliczeniowym CPU/GPU

Karwowski A., Topa T., Noga A.

Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne

|

2016

|

nr 6

503--505, CD

PL

Opisano implementację metody momentów – sztandarowego narzędzia analizy zagadnień inżynierii pola elektromagnetycznego (anteny, kompatybilność EM, mikrofale) – w heterogenicznym środowisku obliczeniowym CPU/GPU niskobudżetowej stacji roboczej typu desktop. Wykazano możliwość znaczącej poprawy wydajności metody dzięki wykorzystaniu zdolności procesora wielordzeniowego i procesorów strumieniowych karty graficznej do przetwarzania równoległego.

EN

Implementation of the Method-of-Moments – as a tool for the analysis of various electromagnetic engineering problems (antennas, electromagnetic compatibility, microwaves) – on a heterogeneous CPU/GPU platform of a typical low-cost desktop workstation is described in the paper. The possibility of attaining noticeable performance improvement of the method by utilizing potential of both the multi-core CPU processor and graphic card for parallel processing is demonstrated.

18

Parallel computation of transient processes on OpenCL framework

Cegielski M.

Przegląd Elektrotechniczny

|

2016

|

R. 92, nr 7

75--78

EN

Parallel execution of calculation of transient analysis is based on a split-level model into sub-systems, which in certain time increments are calculated independently of each other. Each such process has a high computational complexity. The process of implementing the calculation allows the use of parallel systems to calculations based on the use of the GPU, whose dynamic growth has been observed for several years. The article presents a brief description of parallel computing systems based on the OpenCL platform that uses GPUs. There is described the ability to implement the algorithm using this platform. There is also discussed, the timing to perform operations on GPU in relation to the calculations for classic CPU.

PL

Równoległa realizacja obliczeń analizy stanów przejściowych bazuje na podziale na poziomie modelu na pod-układy, które w określonych krokach czasowych obliczane są niezależnie od siebie. Każdy taki proces charakteryzuje się dużą złożonością obliczeniową. Proces realizacji obliczeń pozwala na zastosowanie do obliczeń systemów równoległych opartych o wykorzystanie GPU, których dynamiczny rozwój jest obserwowany od kilku lat. W artykule przedstawiono krótką charakterystykę równoległych systemów obliczeniowych opartych o platformę OpenCL wykorzystującą procesory GPU. Opisano możliwość implementacji algorytmu z wykorzystaniem tej platformy. Omówiono zależności czasowe realizacji obliczeń na procesorach graficznych w stosunku do obliczeń na klasycznych CPU.

19

Effectiveness of Fast Fourier Transform implementations on GPU and CPU

Puchała D., Stokfiszewski K., Szczepaniak B., Yatsymirskyy M.

Przegląd Elektrotechniczny

|

2016

|

R. 92, nr 7

69--71

EN

In this paper, we present the results of comparison of the effectiveness of selected variants of radix-2 Fast Fourier Transform (FFT) algorithms implemented on both Graphics (GPU) and Central (CPU) Processing Units. The considered algorithms differ in memory consumption and the arrangement of data-flow paths which affects the global memory coalescing and cache memory exploitation. The obtained results allow to indicate the variants of FFT algorithms which are best suited for GPU and CPU architectures, to confirm the advisability of GPU oriented calculations of FFT and to formulate a guideline for implementations of fast algorithms of various linear transforms.

XX

W niniejszej pracy przedstawiono wyniki porównania efektywności wybranych wariantów algorytmów szybkiej transformaty Fouriera (FFT) typu radix-2 realizowanych zarówno dla procesorów graficznych (GPU) jak i typowych jednostek centralnych (CPU). Rozważane algorytmy różnią się zapotrzebowaniem pamięciowym oraz postaciami grafów przepływu danych, które mają wpływ na spójność wykorzystania pamięci globalnej oraz pamięci cache jednostek GPU i CPU. Uzyskane wyniki pozwalają na wskazanie wariantów algorytmów FFT, które są najlepiej dostosowane dla architektur GPU i CPU, pozwalają też potwierdzić celowość realizacji implementacji FFT zorientowanych na wykorzystanie jednostek GPU, a także sformułować ogólne wytyczne dla implementacji zorientowanych na wykorzystanie jednostek GPU algorytmów szybkich przekształceń liniowych.

20

Akceleracja metody elementów skończonych przy użyciu procesora graficznego

Dziekoński A., Lamęcki A., Mrozowski M.

Przegląd Elektrotechniczny

|

2016

|

R. 92, nr 9

12--15

PL

Artykuł przedstawia rezultaty akceleracji obliczeń metody elementów skończonych z użyciem procesora graficznego. Dzięki zastosowaniu masowo zrównoleglonych obliczeń na procesorze graficznym dwóch najbardziej kosztownych obliczeniowo etapów generacji macierzy współczynników i rozwiązywania układu równań przy użyciu metody gradientów sprzężonych z wielopoziomowym prekondycjonerem o schemacie V udało się pięciokrotnie skrócić czas symulacji metody elementów skończonych.

EN

This paper presents the results of the acceleration of computations involved in the finite element method obtained with graphics processors. A 5-fold acceleration was achieved thanks to the massive parallelization of two most time-consuming steps of the finite element method, namely matrix generation and the solution of sparse system of linear equations with the conjugate gradient method and a V-cycle multilevel preconditioner.