BazTech - Yadda

1

Parallel Algorithms for Minimal Nondeterministic Finite Automata Inference

Jastrzab Tomasz, Czech Zbigniew J., Wieczorek Wojciech

Fundamenta Informaticae

|

2021

|

Vol. 178, nr 3

203--227

EN

The goal of this paper is to develop the parallel algorithms that, on input of a learning sample, identify a regular language by means of a nondeterministic finite automaton (NFA). A sample is a pair of finite sets containing positive and negative examples. Given a sample, a minimal NFA that represents the target regular language is sought. We define the task of finding an NFA, which accepts all positive examples and rejects all negative ones, as a constraint satisfaction problem, and then propose the parallel algorithms to solve the problem. The results of comprehensive computational experiments on the variety of inference tasks are reported. The question of minimizing an NFA consistent with a learning sample is computationally hard.

2

Zastosowanie klastrów komputerowych w celu zwiększenia efektywności obliczeniowej przedsiębiorstw prognozujących rozprzestrzenianie się zanieczyszczeń w atmosferze

Majer M., Biniasz D.

Logistyka

|

2015

|

nr 3

3020--3029, CD 1

PL

Artykuł zawiera analizę możliwości zwiększenia konkurencyjności i wydajności przedsiębiorstwa zajmującego się wykonywaniem analiz i obliczeń związanych z symulacjami rozprzestrzeniania się zanieczyszczeń w atmosferze poprzez zastosowanie komputerów klastrowych, jako narzędzia minimalizującego czas obliczeń. Autor opisuje również zagadnienia związane z podstawami modelowania systemów monitorujących rozprzestrzenianie się zanieczyszczeń w powietrzu atmosferycznym. Opisane zostały również podstawowe podziały zanieczyszczeń, typy źródeł zanieczyszczeń i rodzaje emiterów. W dalszej części artykułu opisano podstawowe typy opisywanych modeli, jak również możliwe ich implementacje programistyczne, a także sposoby optymalizacji i usprawnienia ich działania poprzez zastosowanie klastrów komputerowych podczas wykonywania obliczeń.

EN

This article contains information about modeling of pollutants dispersion systems in the atmosphere. It also describes main types of pollutants, emission points and analysis of typical atmosphere pollutant dispersion models. It shows a way of using computer cluster systems in modeling process and a proper programming libraries for parallel computing. It also contains information about computational efficiency’s increase possibilities and computer cluster systems using profits for computational industries.

3

Family of Parallel LMS-based Adaptive Algorithms of Echo Cancellation

Dobrucki A., Walczyński M., Bożejko W.

Computational Methods in Science and Technology

|

2015

|

Vol. 21, No. 4

191--200

EN

In this paper we propose a number of new, genuine parallel LMS-based adaptive algorithms in the context of their use in the acoustic echo cancellation. The most complex parts of the LMS-based algorithms were determined and parallelized. A number of genuine parallelization methods were proposed taking into consideration varied types of architectures of modern concurrent computing environment, such as GPUs, clusters of workstations and cloud computing.

4

A parallel algorithm for the decomposition of finite languages

Jastrząb T., Czech Z. J.

Studia Informatica

|

2014

|

Vol. 35, nr 4

5--16

PL

Język skończony jest dekomponowalny, jeżeli może zostać zapisany jako złożenie dwóch niepustych języków. W niniejszym artykule zaproponowany został równoległy algorytm dekompozycji języków skończonych. Efektywność przedstawionego algorytmu została oceniona na podstawie eksperymentów przeprowadzonych dla wybranych języków.

EN

A finite language is said to be decomposable, if it can be written as a catenation of two non-empty languages. In this paper a parallel algorithm for finding the decomposition of finite languages is proposed. The effectiveness of the algorithm is assessed based on the experimental results provided for selected languages.

5

A comparative analysis of two parallel algorithms for finite language decomposition

Jastrząb T.

Pomiary Automatyka Kontrola

|

2014

|

R. 60, nr 6

350--353

EN

A finite language is said to possess a non-trivial decomposition if it can be represented as a catenation of two non-empty languages. In this paper two parallel versions of a known sequential algorithm for finding the decomposition of finite languages are proposed. The effectiveness of the algorithms is estimated based on the experimental results obtained for several sample languages.

PL

Problem dekompozycji języków skończonych jest rozstrzygalny, chociaż trudny do rozwiązania. Problem sprowadza się do wyznaczenia pary niepustych języków skończonych L1, L2, takich że w wyniku operacji ich złożenia powstaje język wejściowy. Każdy język skończony posiada dekompozycję trywialną, a oprócz niej zero lub więcej dekompozycji nietrywialnych. Ze względu na brak algorytmu pozwalającego na wyznaczenie zbioru dekompozycji dla dowolnego języka, w artykule zaproponowano rozwiązanie oparte na przeszukiwaniu wyczerpującym z obcinaniem przestrzeni rozwiązań. Artykuł przedstawia dotychczasowe próby rozwiązywania problemu dekompozycji języków skończonych z wykorzystaniem algorytmów sekwencyjnych (rys. 1) oraz równoległych (rys. 2 i 3). Na podstawie znanych algorytmów opracowano ulepszone wersje algorytmu równoległego (rys. 4 i 5). W zaproponowanym rozwiązaniu skoncentrowano się na minimalizacji narzutu czasowego wynikającego z komunikacji pomiędzy procesami. Dokonana ocena efektywności opracowanych algorytmów oparta została o pomiary czasu wykonania dla implementacji z użyciem biblioteki MPI. Uzyskane wyniki (tab. 1), a w szczególności przyspieszenia pozwalają na ocenę rozwiązań jako nie w pełni zadowalających, w odniesieniu do wykorzystanej liczby procesorów.

6

Very Fast Non-Dominated Sorting

Smutnicki C., Rudy J., Żelazny D.

Decision Making in Manufacturing and Services

|

2014

|

Vol. 8, no. 1-2

13--23

EN

A new and very efficient parallel algorithm for the Fast Non-dominated Sorting of Pareto fronts is proposed. By decreasing its computational complexity, the application of the proposed method allows us to increase the speedup of the best up to now Fast and Elitist Multi-Objective Genetic Algorithm (NSGA-II) more than two orders of magnitude. Formal proofs of time complexities of basic as well as improved versions of the procedure are presented. The provided experimental results fully confirm theoretical findings.

7

The comparison of parallel sorting algorithms implemented on different hardware platforms

Żurek D., Pietroń M., Wielgosz M., Wiatr K.

Computer Science

|

2013

|

Vol. 14 (4)

679--691

EN

Sorting is a common problem in computer science. There are a lot of well-known sorting algorithms created for sequential execution on a single processor. Recently, many-core and multi-core platforms have enabled the creation of wide parallel algorithms. We have standard processors that consist of multiple cores and hardware accelerators, like the GPU. Graphic cards, with their parallel architecture, provide new opportunities to speed up many algorithms. In this paper, we describe the results from the implementation of a few different parallel sorting algorithms on GPU cards and multi-core processors. Then, a hybrid algorithm will be presented, consisting of parts executed on both platforms (a standard CPU and GPU). In recent literature about the implementation of sorting algorithms in the GPU, a fair comparison between many core and multi-core platforms is lacking. In most cases, these describe the resulting time of sorting algorithm executions on the GPU platform and a single CPU core.

8

GPU-based tuning of quantum-inspired genetic algorithm for a combinatorial optimization problem

Nowotniak R., Kucharski J.

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2012

|

Vol. 60, nr 2

323-330

EN

This paper concerns efficient parameters tuning (meta-optimization) of a state-of-the-art metaheuristic, Quantum-Inspired Genetic Algorithm (QIGA), in a GPU-based massively parallel computing environment (NVidia CUDATMtechnology). A novel approach to parallel implementation of the algorithm has been presented. In a block of threads, each thread transforms a separate quantum individual or different quantum gene; In each block, a separate experiment with different population is conducted. The computations have been distributed to eight GPU devices, and over 400× speedup has been gained in comparison to Intel Core i7 2.93GHz CPU. This approach allows efficient meta-optimization of the algorithm parameters. Two criteria for the meta-optimization of the rotation angles in quantum genes state space have been considered. Performance comparison has been performed on combinatorial optimization (knapsack problem), and it has been presented that the tuned algorithm is superior to Simple Genetic Algorithm and to original QIGA algorithm.

9

Zwektoryzowane algorytmy obliczania Transformaty S dla środowiska wieloprocesorowego

Tariov A., Gliszczyński M.

Pomiary Automatyka Kontrola

|

2011

|

R. 57, nr 11

1401-1403

PL

W pracy został zaprezentowany wektoryzowany algorytm obliczania transformaty S w dwóch wariantach - w postaci sekwencyjno-równoległej pozwalającej na oszczędzenie zasobów sprzętowych oraz w postaci równoległej pozwalającej wykorzystać, nowoczesne wielordzeniowe platformy obliczeniowe. W drugim przypadku możliwa jest znaczna redukcja czasu trwania algorytmu. Obie metody mogą znaleźć zastosowanie praktyczne zależnie od oczekiwanej dokładności (rozdzielczości) i szybkości działania jak też możliwości platformy obliczeniowej.

EN

In the paper the algorithm for calculating N by N-point S Transform is presented. In a sequential, recursive option hardware resources saving is available, while on the other hand, a parallel version of the algorithm allows increasing the accuracy and reducing the time when using multi-core platforms. Two of these approaches can be implemented in practical use depending on the expected accuracy, speed and power of the hardware platform. At the beginning of the paper uses of S Transform with other similar solutions are described. Advantages and disadvantages of S Transform, which are good properties of the time-frequency analysis of non-stationary signals thanks to a movable, different sized Gaussian window, but at the same time a long computation time of the standard, sequential method, are considered. Next, the theoretical, continuous form of the transform and the discrete form with the sequential algorithm are presented. Later The main part of the work deals with synthesis of the sequential and parallel version of the algorithm in the matrix-vector form. The data flow in the algorithms in space and time is shown in Figs. 1 and 2 (for sequential and parallel approach). Finally, the computation times of two versions are compared. The advantage of the two presented approaches is simple and understandable tensor product representation which makes the implementation easy. The sequential algorithm can be used for slower platforms, where the real time analysis is not necessary, while the parallel version offers quick computation on multi-core processors.

10

Równoległe implementacje algorytmu Gaussa-Seidela w środowisku OpenMP

Sadecki J.

Pomiary Automatyka Kontrola

|

2011

|

R. 57, nr 3

301-304

PL

W artykule przedstawiono przykładowe rezultaty analizy efektywności równoległych realizacji algorytmu Gaussa-Seidela zaimplementowanych w środowisku procesorów wielordzeniowych. Jak pokazano, standardowa równoległa implementacja tego algorytmu, prowadzi do gorszych w sensie szybkości zbieżności wyników w porównaniu do sekwencyjnej wersji tej metody. Zaproponowana nowa wersja równoległa metody Gaussa-Seidela posiada analogiczną szybkość zbieżności jak jej realizacja sekwencyjna, zachowując przy ty łatwość implementacji równoległej. W artykule przedstawiono przykładowe rezultaty obliczeń przeprowadzonych przy wykorzystaniu procesora czterordzeniowego. Rozważana implementacja algorytmu Gaussa-Seidela posiada też możliwości jej zastosowania dla szerszej niż rozważana w pracy klasy problemów optymalizacji.

EN

The paper presents results of the efficiency analysis for some parallel realization of optimisation algorithms in multicore processors. The results concern a simple Gauss-Seidel optimization algorithm. In the paper both standard parallel and new parallel implementations of the Gauss-Seidel algorithm are presented. As it is pointed out, the standard parallel algorithm leads to worse numerical results (in terms of the rate of computation convergence) than the sequential version of this algorithm. The new parallel algorithm achieves the same numerical ef?ciency of computations as the sequential algorithm and, additionally, can be aesily implemented in multicore processors. It is prooved that, for the quadratic optimization problem, the modified parallel Gauss-Seidel algorithm leads to the same computational results as for the sequential implementation of the method. Some examples of parallel implementations of the method in fourcore processors are presented. The proposed new algorithm enables achieving good efficiency of parallel computations both in terms of the execution time and the speedup factor value. The new algorithm can also be used to solve broader classes of optimization problems, which in the nearest neighbourhood of the optimal solution can be sufficiently precisely approximated by the square function.

11

Efficiency of the computer tomography algorithms in examination of the internal structure of materials with non-transparent elements by using the incomplete information

Grzymkowski R., Hetmaniok E., Pleszczyński M.

Journal of Achievements in Materials and Manufacturing Engineering

|

2011

|

Vol. 49, nr 2

285--298

EN

Purpose: of this paper: The effectiveness of computer tomography algorithms applied for reconstructing the internal structure of objects containing the non-transparent elements is discussed, in conditions of the incomplete information about the examined object. Design/methodology/approach: Problem of the internal structure examination of an object with non-transparent elements, without its destruction, is considered by means of the classical and non-classical algebraic algorithms of computer tomography used in standard approaches and in cases of incomplete projection data. Findings: Computer tomography algorithms, known from literature and designed by the authors, are tested in solving the problems of reconstructing the discrete objects of high contrast with non-transparent elements, with regard to their precision, convergence and utility. Carried out research indicate that the chaotic algorithms are more efficient, for the same values of parameters, in comparison with the corresponding classical algorithms. Practical implications: Problems considered in the paper can arise in some technical issues, for example, in exploring the coal interlayers in looking for the compressed gas reservoirs which can be dangerous for the people’s life and health, in which application of the standard algorithms of computer tomography is impossible for some reasons (like size of the examined object, its localization or its accessibility). Originality/value: In the paper the originally designed by the authors reconstruction algorithms are presented which appear to be more effective than the standard algebraic algorithms adapted for solving problems with the incomplete projection data.

12

GPU-based massively parallel implementation of metaheuristic algorithms

Nowotniak R., Kucharski J.

Automatyka / Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie

|

2011

|

T. 15, z. 3

595-611

EN

In this paper, implementation of Quantum-Inspired Genetic Algorithm(QIGA) in massively parallel environment (Graphics Processing Units) has been presented. Contrary to many recent papers concerning parallel implementation of evolutionary algorithms, in this paper a novel approach has been taken. QIGA algorithm has been implemented entirely as a computational kernel. Parallelization of the algorithm has been performed on two levels: In a block of threads, each thread transforms a separate individual or different gene; In each block, separate populations with same or different parameters are evolved. Finally, the computations have been distributed to eight GPU devices, and over 400x speedup has been gained in comparison to sequential implementation of the algorithm in ANSI C on one Intel Core i7 2.93 GHz CPU core. Correctness of the results has been verified in statistical analysis. The presented approach can be applied to experimentation with a broad class of metaheuristics.

PL

W artykule zostały przedstawione szczegóły implementacji kwantowo inspirowanego algorytmu genetycznego (QIGA) w środowisku obliczeń masowo równoległych na procesorach kart graficznych. W odróżnieniu od wielu dotychczasowych opracowań, prezentujących implementacje algorytmów ewolucyjnych w środowiskach obliczeń równoległych, w niniejszym artykule zostało zaproponowane nowatorskie podejście do implementacji algorytmu ewolucyjnego. Zrównoleglenie algorytmu zostało wykonane na dwóch poziomach: poszczególne osobniki w populacji lub poszczególne geny są przetwarzane przez osobne wątki w blokach, a w poszczególnych blokach przeprowadzany jest proces ewolucji populacji o tych samych lub różnych parametrach. Obliczenia zostały rozdzielone na osiem jednostek GPU, co pozwoliło na uzyskanie ponad 400-krotnego przyśpieszenia algorytmu w stosunku do sekwencyjnej implementacji w języku ANSI C na pojedynczym rdzeniu procesora Intel Core i7 2,93 GHz. Poprawność implementacji została zweryfikowana poprzez analizę statystyczną otrzymanych wyników. Zaproponowane podejście pozwala przyśpieszyć badanie dowolnych metaheurystyk przeszukiwania.

13

Load balancing in parallel implementation of vascular network modeling

Jurczuk K., Krętowski M., Bézy-Wendling J.

Zeszyty Naukowe Politechniki Białostockiej. Informatyka

|

2010

|

Z. 6

41-61

EN

In this paper, load balancing mechanisms in a parallel algorithm of vascular network development are investigated. The main attention is focused on the perfusion process (connection of new cells to vascular trees) as it is the most time demanding part of the vascular algorithm. We propose several techniques that aim at balancing load among processors, decreasing their idle time and reducing the communication overhead. The core solution is based on the centralized dynamic load balancing approach. The model behaviors are analyzed and a tradeoff between the different mechanisms is found. The proposed mechanisms are implemented on a computing cluster with the use of the message passing interface (MPI) standard. The experimental results show that the introduced improvements provide a more efficient solution and consequently further accelerate the simulation process.

PL

W artykule rozważane są mechanizmy zrównoważające obciążenie w równoległym algorytmie rozwoju sieci naczyń krwionośnych. Główną uwagę zwrócono na proces perfuzji (podłączanie nowych komórek do drzew krwionośnych) jako, że proces ten jest najbardziej czasochłonnym fragmentem rozpatrywanego algorytmu. Zaproponowane przez autorów rozwiązania mają na celu zrównoważenie obciążenia pomiędzy procesorami, skrócenie ich czasu bezczynności oraz zredukowanie narzutu komunikacyjnego. Jądro rozwiązania jest oparte na scentralizowanym dynamicznym podejściu równoważenia obciążenia. Zachowania modelu zostały przeanalizowane i kompromis pomiędzy różnymi technikami został zaproponowany. Przedstawione mechanizmy zostały zaimplementowane na klastrze obliczeniowym przy wykorzystaniu standardu MPI. Otrzymane rezultaty jednoznacznie pokazuja˛ iż wprowadzone usprawnienia zapewniają bardziej efektywne rozwiązanie co w konsekwencji pozwala na jeszcze większe przyśpieszenie procesu symulacji.

14

Application of the computer tomography in examination of the internal structure of materials by considering the specific conditions of the problem

Grzymkowski R., Hetmaniok E., Pleszczyński M., Zielonka A.

Journal of Achievements in Materials and Manufacturing Engineering

|

2010

|

Vol. 43, nr 1

288--298

EN

Purpose of this paper: In this paper we present a summary of the results reached in the field of computer tomography applied in some special case – for the problem of incomplete projection data. This particular problem arises in the technical issues in which, for some reasons (like for example size of the examined object, its localization or its accessibility), it is impossible to apply the standard algorithms of computer tomography. Design/methodology/approach: In the paper we discuss the standard algebraic algorithms of computer tomography and, additionally, the new algebraic algorithms (parallel and chaotic), designed by the authors, suitable not only for the case of incomplete projection data but also useful in the standard approach. Findings: The above mentioned algorithms are tested in solving the problems of reconstruction the discrete objects of high-contrast. Moreover, convergence, stability and utility of the algorithms are proved experimentally. Research limitations/implications: Algorithms, created by the authors, are designed for the multiprocessor computers which allow to execute the calculations simultaneously. However, the results compiled in the paper were elaborated by using the one-processor computer. Calculations in which the parallel computing structure will be used are planned for the nearest future.Practical implications: Possibilities of the effective applications of the discussed algorithms in different practical technical problems are showed in the paper. Research, done till now, indicate the chances of applying the proposed algorithms in certain technical problem in which the incomplete projection data appear (like, for example, in searching for the elements in material which cause decreasing of its strength or in looking for the compressed gas reservoirs in the coal bed, which can be dangerous for the people’s life and health). Originality/value: The paper presents the reconstruction algorithms (block and chaotic-bloc), designed by the authors, which appear to be more effective than the standard algebraic algorithms adapted for solving problems with the incomplete projection data.

15

Evolutionary Rough Parallel Multi-Objective Optimization Algorithm

Maulik U., Sarkar A.

Fundamenta Informaticae

|

2010

|

Vol. 99, nr 1

13-27

EN

A hybrid unsupervised learning algorithm, which is termed as Parallel Rough-based Archived Multi-Objective Simulated Annealing (PARAMOSA), is proposed in this article. It comprises a judicious integration of the principles of the rough sets theory and the scalable distributed paradigm with the archived multi-objective simulated annealing approach. While the concept of boundary approximations of rough sets in this implementation, deals with the incompleteness in the dynamic classification method with the quality of classification coefficient as the classificatory competencemeasurement, the time-efficient parallel approach enables faster convergence of the Pareto-archived evolution strategy. It incorporates both the rough set-based dynamic archive classification method and the distributed implementation as a two-phase speedup strategy in this algorithm. A measure of the amount of domination between two solutions has been incorporated in this work to determine the acceptance probability of a new solution with an improvement in the spread of the non-dominated solutions in the Pareto-front by adopting rough sets theory. A complexity analysis of the proposed algorithm is provided. An extensive comparative study of the proposed algorithm with three other existing and well-known Multi-Objective Evolutionary Algorithms (MOEAs) demonstrate the effectiveness of the former with respect to four existing performance metrics and eleven benchmark test problems of varying degrees of difficulties. The superiority of this new parallel implementation over other algorithms also has been demonstrated in timing, which achieves a near optimal speedup with a minimal communication overhead.

16

Integracja wielu platform i systemów operacyjnych, za pomocą mechanizmów RPC dla zrównoleglenia rozwiązania złożonych algorytmów

Zghidi H., Świtoński A., Adamczyk B.

Studia Informatica

|

2009

|

Vol. 30, nr 4

67-80

PL

Celem niniejszej pracy jest wykorzystanie mechanizmów zdalnego wywołania procedur do integracji wielu platform komputerowych do rozwiązania złożonych algorytmów obliczeniowych. W tym celu wykorzystano heterogeniczną sieć komputerową, składającą się z komputerów o różnej architekturze i pracujących pod kontrolą różnych systemów operacyjnych.

EN

The aim of this work is to use remote procedure call mechanisms to integrate multiple computer platforms in order to solve complicated computation algorithms. A heterogeneous computer network consisting of computers with different architectures and different operating systems.

17

Analysis of an algorithm of distributed generation of matrices for markovan models of a congestion control mechanism

Bylina J., Bylina B.

Studia Informatica

|

2008

|

Vol. 29, nr 4B

93-102

EN

We consider the parallel generation of matrices corresponding to models of congestion control mechanisms' behavior. We develop a piece of software for a cluster architecture and analyze its performance times, amount of communication, each processor's load. The resulting application is scalable and also produces a substantial speedup and efficiency.

PL

Rozważamy równoległą generację macierzy odpowiadających modelom zachowania mechanizmów kontroli zatłoczenia. Rozwijamy oprogramowanie dla architektury klastrowej i analizujemy czasy jego działania, ilość komunikacji, obciążenie każdego z procesorów. Otrzymana aplikacja jest skalowalna i daje znaczące przyśpieszenie oraz efektywność.

18

Klastry obliczeniowe. Poznawanie i wykorzystywanie za pośrednictwem sieci Internet. Środowisko MPI

Majer M.

Zeszyty Naukowe. Elektryka / Politechnika Opolska

|

2007

|

Vol. 320, z. 58

137-144

PL

W dobie rozwoju technologicznego stajemy przed coraz trudniejszymi zadaniami obliczeniowymi. Alternatywą dla pogoni za mocą obliczeniową komputera jest wykorzystanie możliwości jakie oferuje system klastrowy. W artykule omówiono możliwości dotarcia do komputera klastrowego na przykładzie Politechniki Opolskiej. Przybliżono sposób nawiązywania połączenia oraz niezbędne wymagania. Opisano również bibliotekę równoległego programowania MPI. Artykuł zawiera również analizę wydajności przykładowego algorytmu algebry liniowej w systemie klastrowym.

EN

In a high technology age we live, single PC computers are often not enough for comlitated computing problems. This article is about methods of using computer cluster systems with a proper programing libraries, for example MPI library. It also contains an example analysis of linear algebra algorithm which was tested on a cluster computer system placed at Technical University of Opole.

19

Preconditioning of voxel FEM elliptic systems

Margenov S., Vutov Y.

TASK Quarterly : scientific bulletin of Academic Computer Centre in Gdansk

|

2007

|

Vol. 11, No 1-2

117-128

EN

The presented comparative analysis concerns two iterative solvers for large-scale linear systems related to žFEM simulation of human bones. The considered scalar elliptic problems represent the strongly heterogeneous structure of real bone specimens. The voxel data are obtained with high resolution computer tomography. Non-conforming Rannacher-Turek finite elements are used to discretize of the considered elliptic problem. The preconditioned conjugate gradient method is known to be the best tool for efficient solution of large-scale symmetric systems with sparse positive definite matrices. Here, the performance of two preconditioners is studied, namely modified incomplete Cholesky factorization, MIC(0), and algebraic multigrid. The comparative analysis is mostly based on the computing times to run the sequential codes. The number of iterations for both preconditioners is also discussed. Finally, numerical tests of a novel parallel MIC(0) code are presented. The obtained parallel speed-ups and efficiencies illustrate the scope of efficient applications for real-life large-scale problems.

20

The convergence rate and parallel performance of a 3D elliptic solver

Lirkov I., Vutov V.

Systems Science

|

2006

|

Vol. 32, no 4

73-81

EN

It was shown that block-circulant preconditioners applied to a conjugate gradient method used to solve structured sparse linear systems arising from 2D or 3D elliptic problems have good numerical properties and a potential for high parallel efficiency. In this paper, the convergence rate and the parallel performance of a circulant block-factorization based preconditioner applied to a 3D problem are analyzed. A portable parallel code is developed based on Message Passing Interface (MPI) standards. The numerical tests performed on parallel computer systems demonstrate the level of efficiency of the algorithm developed.