BazTech - Yadda

1

Practical parallelization of Gear-Nordsieck and Brayton-Gustavson-Hatchel stiff ODE solver

Stabrowski Marek

Annals of Computer Science and Information Systems

|

2021

|

Vol. 25

313--316

EN

The paper compares two ODE solvers using an example of a heat transfer equation. The sequential version of Brayton-Gustavson-Hatchel solver has been slightly inferior to Gear-Nordsieck solver. Algorithms profiling has led to the decision of parallelizing linear equation solving section and function evaluation. The first approach (parallelizing linear equations) improves performance of both algorithms. Second approach (parallelizing function evaluation) boosts BGH solver performance. Finally, it has been proved that wholly parallel version of BGH solver is more efficient with respect to processing time.

2

Parallel computations and co-simulation in universal mechanism software. Part 2: Examples

Pogorelov Dmitry, Rodikov Alexander, Kovalev Roman

Transport Problems

|

2019

|

T. 14, z. 4

31--38

EN

The second part of the paper continues a discussion on the topic of paralel computations in railway dynamics. The algorithms described in the first part of the paper are applied to parallel simulation on computers with multicore processors of six different models of rail vehicles and trains with the number of degrees of freedom from about one hundred to more than 20 thousands. A considerable simulation speedup is reported. In addition, an example of evaluation of wheel profile wear on multicore processors and comparison of different approaches to multi-variant computations are considered.

3

Parallel computations and co-simulation in universal mechanism software. Part 1: Algorithms and implementation

Pogorelov Dmitry, Rodikov Alexander, Kovalev Roman

Transport Problems

|

2019

|

T. 14, z. 3

163--175

EN

Parallel computations speed up simulation of multibody system dynamics, in particular, dynamics of railway vehicles and trains. It is important for reduction of required time at the stage of new railway vehicle design, for increase of complexity of studied problems and for real-time applications. We consider realization of paralel computations in Universal Mechanism software in three different areas: simulation of rail vehicle and train dynamics, evaluation of wheel profile wear and multi-variant computations. The use of clusters for parallel running of multi-variant computations is illustrated. Co-simulation based on the interface between Universal Mechanism and Matlab/Simulink and other software tools is discussed.

4

Architektura węzłowa : superkomputer klasy Beowulf

Lenarczyk P., Piotrowski Z.

Elektronika : konstrukcje, technologie, zastosowania

|

2018

|

Vol. 59, nr 2

12--14

PL

Zapotrzebowanie na możliwości obliczeniowe nieustannie wzrasta w wielu dziedzinach wiedzy. Dotyczy to również działów, które wcześniej uznawane były za niewymagające obliczeniowo. Szczególną odpowiedzią jest technologia superkomputerów, których liczba dynamicznie zwiększa się. Możliwość rozwoju poszczególnych dziedzin może zostać w znacznym stopniu ułatwiona dzięki rozwojowi technologii Obliczeń Ogólnego Przeznaczenia z użyciem typowych graficznych procesorów masowo równoległych. W artykule zawarto kompletny opis budowy superkomputera w architekturze węzłowej, wraz z opisem problemów związanych z praktyczną implementacją.

EN

Nowadays computational demands are rapidly growing in many scientific areas. This also applies to engineering branches that were previously considered as not very computationally demanding. The answer is supercomputer technology, the number of which is dynamically increasing. Attention should be given to supercomputers, with General Purpose Graphical Processing Unit technology. Such coprocessor could easily enhance computational power in many scientific areas of interest. The paper describes node architecture of supercomputer with description of practical implementation problems.

5

Porównanie metod obliczeń równoległych OpenMP i CUDA

Maj Michał

Zeszyty Naukowe WSEI. Seria Transport i Informatyka

|

2015

|

T. 5, nr 1

19--27

PL

Programowanie równoległe oznacza tworzenie programów w taki sposób, by można je było wykonywać równocześnie na wielu procesorach. Na potrzeby niniejszego artykułu napisane zostały dwa programy zrównoleglone – jeden w CUDA C oraz jeden w OpenMP, przeznaczony dla CPU – oraz jeden sekwencyjny (niewspółbieżny). Najszybszym sposobem zrównoleglania okazał się program napisany w CUDA, w którym wykorzystuje się pamięć niekopiowaną. Wadą CUDA jest to, że działa tylko ze sprzętem firmy NVIDIA.

EN

Parallel programming means development of programs, which can be executed truly concurrently on multiprocessor platforms. For current test purposes two parallel programs have been developed – one in CUDA C language, second using OpenMP library. Also equivalent sequential (non-parallel) program has been developed. Most efficient parallelization have been achieved in CUDA program with page-locked memory. CUDA is handicapped by limitation to NVIDIA hardware.

6

Scalability tests of the direct numerical simulation solver UNS3

Szeliga W., Morzyński M., Stankiewicz W., Kotecki K.

Journal of Mechanical and Transport Engineering

|

2015

|

Vol. 67, No. 4

59--69

EN

In this paper analysis of scalability of the solver UNS3, dedicated to direct numerical simulation (DNS) of Navier-Stokes equations, is presented. Efficiency of parallel computations has been examined with the use of a PC cluster built by the Division of Virtual Engineering. Tests have been carried out on a different number of partitions, in the range of 1÷80. The test case was steady flow around a wall-mounted circular cylinder with Reynolds number set to the value of Re = 10. The research included the measurement of preparatory time, calculation time, communication time, speedup, core hours and efficiency.

PL

W niniejszym artykule zawarto analizę skalowalności solwera UNS3 służącego do obliczeń CFD (ang. computational fluid dynamics) typu DNS (ang. direct numerical simulation). Skuteczność wykorzystania wielowątkowości sprawdzano przy użyciu klastra Zakładu Inżynierii Wirtualnej. Badania prowadzono na procesorach typu Intel® CoreTM 2 Quad oraz Intel® Xeon® przy ilości partycji w zakresie 1÷80. Za testowe zadanie posłużyły obliczenia stacjonarne opływu cylindra o przekroju kołowym zamocowanego na ścianie, przy liczbie Reynoldsa Re = 10. Badano czas obliczeń, czas komunikacji międzywęzłowej, przyspieszenie w wyniku zrównoleglenia, zużycie zasobów oraz efektywność ich wykorzystania.

7

Event monitoring of parallel computations

Gruzlikov A. M., Kolesov N. V., Tolmacheva M. V.

International Journal of Applied Mathematics and Computer Science

|

2015

|

Vol. 25, no. 2

311--321

EN

The paper considers the monitoring of parallel computations for detection of abnormal events. It is assumed that computations are organized according to an event model, and monitoring is based on specific test sequences.

8

Obliczeniowa opłacalność zrównoleglenia drobnoziarnistego algorytmu numerycznego całkowania typu PECE

Gardecki A.

Przegląd Elektrotechniczny

|

2014

|

R. 90, nr 11

67--69

PL

W artykule omówiono problemy związane z przystosowaniem metody rozwiązywania układów równań różniczkowych zwyczajnych typu predyktor-korektor (PECE) do obliczeń w układach równoległych. Zastosowanie tego typu algorytmów może być obliczeniowo opłacalne, szczególnie gdy obliczanie funkcji prawej strony równania różniczkowego jest kosztowne. Jednakże obliczenia równoległe z wieloma punktami synchronizacji mogą powodować wydłużenie czasu obliczeń w porównaniu do obliczeń sekwencyjnych.

EN

This paper presents a performance analysis predictor-corrector (PECE) numerical integration method in the parallel computation calculations. The use of parallel algorithms for performing calculations in the analysis of initial value problems can be computationally viable, especially if the right hand side of the calculation function of the differential equation is time expensive. However, the calculations in parallel with a number of synchronization points may take a long computation time in comparison to the sequential calculation.

9

Analiza kosztów obliczeń wybranych wariantów zrównoleglenia algorytmu rozwiązywania równań różniczkowych w obliczeniach obserwatora stanu

Gardecki A.

Przegląd Elektrotechniczny

|

2014

|

R. 90, nr 2

205--208

PL

W artykule przedstawiono porównanie wydajności obliczeniowej procedury BGKODE_DSP służącej do rozwiązywania układów równań różniczkowych zwyczajnych (ODE) w obliczeniach równoległych na przykładzie obliczeń obserwatora stanu. Modyfikacje sposobu realizacji obliczeń w celu minimalizacji liczby punktów synchronizacji przyczyniły się do wzrostu jej wydajności obliczeniowej.

EN

This paper presents a performance analysis the BGKODE_DSP routine used to solve systems of ordinary differential equations (ODE) in the calculation of parallel computation on the example of a state observer. Modifications to the calculation method of execution in order to minimize the number of synchronization points contributed to the growth of its computational efficiency.

10

Semantyczny system wsparcia operatorów pojazdów mobilnych z zastosowaniem obliczeń równoległych

Musialik P., Masłowski A.

Prace Naukowe Politechniki Warszawskiej. Elektronika

|

2014

|

z. 194, t. 2

521--530

PL

W poniższym artykule został przedstawiony postęp prac nad semantycznym systemem wsparcia, przeznaczonym dla operatorów pojazdów mobilnych, naziemnych i latających. Szczególny nacisk położony jest na działania w środowisku SAR (Search And Rescue). Celem systemu jest obniżenie obciążenia kongnitywnego operatora. Proponowanym rozwiązaniem jest stworzenie modelu semantycznego otoczenia pojazdu, który umożliwia uporządkowanie informacji płynących z sensorów i pozwala na przeprowadzenie rozumowania w oparciu o zgromadzone dane. Artykuł koncentruje się na tworzeniu wspomnianego modelu na podstawie chmur punktów 3D. Opisano równoległą implementację algorytmu klasyfikacji punktow, metody RGB (Regular Grid Decomposition) oraz segmentacji. Przedstawiono przykłady rozumowania opartego o stworzony model.

EN

The paper presents a semantic support system for operators of mobile, air and ground, platforms. The goal behind the system is to decrease the cognitive load of the operator. This is achieved by integrating the platforms sensor data into a semantic model of the surrounding. The model is created based on a an ontology. Compatibility with QSTRR (Qualitative Spatio-Temporal Representation and Reasoning) framework allows for qualitative reasoning using the model. The ontology also integrates HDM (Humanitarian Data Model) to allow easy use of geographical data. A parallel implementation of model creation algorithms is shown. Examples of reasoning using the model are described.

11

Distributed multi-node, multi-GPU, heterogeneous system for 3D image reconstruction in Electrical Capacitance Tomography – network performance and application analysis

Kapusta P., Majchrowicz M., Sankowski D., Jackowska-Strumiłło L., Banasiak R.

Przegląd Elektrotechniczny

|

2013

|

R. 89, nr 2b

339--342

EN

3D ECT provides a lot of challenging computational issues as image reconstruction requires execution of many basic operations of linear algebra, especially when the solutions are based on Finite Element Method. In order to reach real-time reconstruction a 3D ECT computational subsystem has to be able to transform capacitance data into image in fractions of seconds. By performing computations in parallel and in a distributed, heterogeneous, multi-GPU environment a significant speed-up can be achieved. Nevertheless performed tests clearly illustrate the need for developing a highly optimized distributed platform, which would mitigate existing hardware and software limitations.

PL

3D ECT zapewnia wiele złożonych problemów obliczeniowych, jako, że rekonstrukcja obrazu wymaga wykonania wielu podstawowych operacji algebry liniowej, zwłaszcza, gdy rozwiązania oparte są na Metodzie Elementów Skończonych. W celu osiągnięcia rekonstrukcji w czasie rzeczywistym system obliczeniowy musi być zdolny do przekształcania danych pomiarowych na obraz w ułamkach sekund. Poprzez wykonywanie obliczeń w sposób równoległy, z wykorzystaniem rozproszonego środowiska heterogenicznego multi-GPU można uzyskać znaczne ich przyspieszenie. Niemniej przeprowadzone badania wyraźnie pokazują potrzebę opracowania wysoce zoptymalizowanej, rozproszonej platformy, która pozwoliłaby na ominięcie istniejących ograniczeń sprzętowych i programowych.

12

Comcutejs:a Web browser based platform for large-scale computations

Dębski R., Krupa T., Majewski P

Computer Science

|

2013

|

Vol. 14 (1)

143--152

EN

The paper presents a new, cost effective,volunteer computing based platform. It utilizes volunteers’web browsers as computational nodes. The computational tasks are delegated to the browsers and executed in the background (independently of any user interface scripts) making use of the HTML5 web workers technology. The capabilities of the platform hale been proved by experiments performer in a wide range of numbers of computational nodes (1–400).

13

Parallel computations of the step response of a floor heater with the use of a graphics processing unit. Part 2: results and their evaluation

Gołębiowski J., Forenc J.

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2013

|

Vol. 61, nr 4

949--954

EN

Using models and algorithms presented in the first part of the article, a spatio-temporal distribution of the step response of a floor heater was determined. The results have been presented in the form of heating curves and temperature profiles of the heater in the selected time moments. The computations results were verified through comparing them with the solution obtained with the use of a commercial program - NISA. Additionally, the distribution of the average time constant of thermal processes occurring in the heater was determined. The analysis of the use of a graphics processing unit in numerical computations based on the conjugate gradient method was done. It was proved that the use of a graphics processing unit is profitable in the case of solving linear systems of equations with dense coefficient matrices. In the case of a sparse matrix, the speed-up depends on the number of its non-zero elements.

14

Parallel computations of the step response of a floor heater with the use of a graphics processing unit. Part 1: models and algorithms

Gołębiowski J., Forenc J.

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2013

|

Vol. 61, nr 4

943--948

EN

The article presents a method of computing the step response of an air floor heater. The method implements parallel algorithms on a graphics processing unit. In the analyzed concrete slab heating ducts are placed. Hot air is transferred through them, thanks to which the heat penetrates into the slab. Heat transfer into the environment takes place on the top surface of the floor by natural convection and radiation. The bottom surface of the slab is thermally insulated. A two-dimensional heat equation was discretized with the use of the implicit finite difference method. In order to solve the obtained system of equations, the conjugate gradient method was used. Moreover, in order to examine the possibility of shortening the computations time, the algorithm of this method was implemented on a graphics processing unit. A computer program, using the CUDA parallel computing platform and linear algebra libraries CUBLAS and CUSPARSE, was developed.

15

Vortex particle method and parallel computing

Kosior A., Kudela H.

Journal of Theoretical and Applied Mechanics

|

2012

|

Vol. 50 nr 1

285-300

EN

In this paper, it was presented numerical results related to three dimensional simulation of motion of a vortex ring. For the simulation it was chosen the Vortex In Cell method. The method was shortly described in the paper. The numerical results were obtained on the single processor (x86) architecture. The disadvantage of the single processor computation is a very long time of computation. To menage this problem, we switched to the parallel architecture. In our first approach to the multicore architecture we tested the possibility and algorithms for the solution of the algebraic system of equations that resulted form discretization of the Poisson equation. We presented the results obtained with CUDA architecture. In order to better understand how does the parallel algorithms work on CUDA architecture, it was shortly presented a scheme of the device and how programs are executed on it. We showed also our results which are related to the parallelization of some simple iterative methods like the Jacobi method and Red-Black Gauss-Seidel method for solution of the algebraic system. The results were ncouraging. For the Red Black Gauss-Seidel using GTX480 card, the calculations were 90-times shorter than on a single processor. As we know the solution to the Poisson equation is equivalent to the solution to the algebraic systems.

PL

W pracy przedstawiono wyniki numeryczne ruchu trójwymiarowego pierścienia wirowego. W obliczeniach zastosowano metodę cząstek wirowych, która została pokrótce opisana. Obliczenia przeprowadzono na pojedynczym procesorze (x86). Wadą takiej realizacji jest długi czas obliczeń. Dla przyspieszenia obliczeń zaproponowano algorytm obliczeń równoległych w środowisku wieloprocesorowym karty graficznej z technologią CUDA. Architekturę karty krótko opisano. Znajomość architektury ma istotne znaczenie dla efektywności kodu. Napisany program przetestowano, rozwiązując układ równań algebraicznych otrzymany po dyskretyzacji równania Poissona. Przedstawiono wyniki obliczeń dla zrównoleglonych, prostych metod iteracyjnych rozwiązywania układów równań takich jak metoda Jacobiego czy „Red-Black Gauss-Seidel”. Dla metody „Red-Black Gauss-Seidel” oraz karty GTX480 otrzymano 90-krotne przyspieszenie czasu obliczeń względem pojedynczego procesora.

16

Zastosowanie techniki zrównoleglenia obliczeń do poprawy wydajności numerycznego algorytmu rozwiązywania równań różniczkowych

Gardecki A.

Elektronika : konstrukcje, technologie, zastosowania

|

2012

|

Vol. 53, nr 12

49-51

PL

W artykule przedstawiono analizę wydajności obliczeniowej metody typu predykator - korektor (PECE) służącej do rozwiązywania układów równań różniczkowych zwyczajnych (ODE) w układach równoległych lub wielowątkowych. Zastosowanie algorytmów równoległych w analizie zagadnień początkowych jest szczególnie opłacalne, gdy obliczanie funkcji prawej strony równania różniczkowego jest kosztowne lub liczba równań układu jest duża (np. w analizie dużych układów elektrycznych).

EN

The paper presents performance analysis a predictor-corrector (PECE) method for solving systems of ordinary differential equations (ODE) in parallel or multithreaded systems. The use of parallel algorithms in the analysis of initial value problems is cost effective when calculation of the right side function of the differential equation is a costly or system of equations is large (e.g. in the analysis of large electrical systems).

17

Zastosowanie standardu MPI w systemie wielordzeniowym do analizy stanów nieustalonych obwodów elektrycznych

Forenc J.

Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska

|

2011

|

nr 4

29-32

PL

W pracy przedstawiono równoległą metodę analizy stanów nieustalonych obwodów elektrycznych zaimplementowaną w systemie wielordzeniowym. Do komunikacji pomiędzy działającymi procesami zastosowano standard przesyłania komunikatów MPI. W praktycznym przykładzie analizy stanu nieustalonego otrzymano dobrą dokładność obliczeń oraz skrócenie ich czasu w porównaniu z obliczeniami sekwencyjnymi.

EN

In this paper the parallel method for transient analysis of electrical circuits, implemented in the multi-core system, is presented. Communication among running processes was carried out with the use of the MPI standard. In the practical example of the transient analysis, a good accuracy of results and shortening computations time comparing the sequential method, were obtained.

18

LMS algorithms parallelization in GPGPU environment

Bożejko W., Dobrucki A., Walczyński M.

Elektronika : konstrukcje, technologie, zastosowania

|

2011

|

Vol. 52, nr 5

49-53

EN

In this work we propose a methodology of LMS filters class parallelization, used in digital signal processing, used in noise reduction and echo cancelation problems. We propose an approach which uses a GPGPU technology. Parallel approach allows us to decompose a problem into a number of smaller ones, which can be computed faster. Obtained results (especially increasing speed and efficiency) show that the parallel method implemented on a GPU is much more effective than other existing procedures which makes it useful in real-time systems.

PL

W niniejszej pracy proponujemy metodologię zrównoleglenia algorytmów LMS, które są używane w systemach czasu rzeczywistego np. w procesie redukcji hałasu, czy też likwidacji echa. Proponujemy równoległe podejście, które korzysta z technologii GPGPU. Takie podejście pozwala zdekomponować problem na kilka mniejszych, które mogą zostać policzone szybciej. Uzyskane wyniki (w szczególności zwiększenie szybkości i efektywności) pokazują, że realizowana na GPU metoda równoległa jest bardziej efektywna niż inne istniejące procedury, co czyni ją szczególnie użyteczną w systemach czasu rzeczywistego.

19

Using GPU to improve performance of calculating recurrence plot

Rybak T.

Zeszyty Naukowe Politechniki Białostockiej. Informatyka

|

2010

|

Z. 6

77-94

EN

Simulation and analysis of sophisticated systems require much computations. Moore’s law, although still allows for increasing number of transistors on the die, does not lead to increase of performance of single chip — instead it leads to increased parallelism of entire system. This allows for improving performance of those algorithms that can be parallelised; recurrence plot is one of such algorithms. Graphical Processing Units (GPU) show the largest increase of parallel computations capabilities. At the same time they do not behave as traditional CPUs and require different style of programming to fully utilise their capabilities. Article shows techniques that can be used to increase performance of computing of recurrence plot on GPGPU.

PL

Analiza skomplikowanych systemów wymaga przeprowadzenia wielu obliczeń. Prawo Moore’a, choć wciąż˙ pozostaje w mocy, nie pozwala na zwiększanie wydajności pojedynczego procesora, ale pomaga w tworzeniu wydajnych równoległych systemów. Pozwala to na zwiększanie wydajności dla algorytmów które można zrównoleglić; recurrence plot należy do takich algorytmów. Procesory graficzne (GPU) oferują największą ilość równoległych jednostek obliczeniowych, jednocześnie jednak ich wydajne wykorzystanie wymaga innego podejścia programistycznego. Artykuł opisuje w jaki sposób wykorzystano technologię CUDA do przyśpieszania obliczania recurrence plot.

20

The Electromagnetic Field Parallel Cluster Simulation on an Anatomic Human Head Model

Walendziuk W., Tarasow E., Idzikowki A.

Przegląd Elektrotechniczny

|

2010

|

R. 86, nr 3

92-93

EN

In the paper the way of computations of electromagnetic field scattering in a numerical model of human head is demonstrated. The work presents a structure of a model based on the MRI (Magnetic Resonance Image). In the parallel program, which is used in the computations, the FDTD (Finite-Difference Time-Domain) method was implemented. The software was applied in the three-node cluster of workstations (COW).

PL

W niniejszym artykule przedstawiono metodę obliczania propagacji pola elektromagnetycznego w anatomicznym modelu głowy ludzkie. W pracy przedstawiono strukturę badanego modelu opartego na obrazach, otrzymanych metodą rezonansu magnetycznego (MRI). Do obliczeń wykorzystano program równoległy, zaimplementowany w sieci klaster, opartej na trzech węzłach, który opracowano na podstawie metody różnic skończonych w dziedzinie czasu FDTD (Finite-Difference Time-Domain