Wyniki wyszukiwania - BazTech

1

Performance and scalability experiments with a large-scale air pollution model on the EuroHPC petascale supercomputer DISCOVERER

Ostromsky Tzvetan

Annals of Computer Science and Information Systems

|

2022

|

Vol. 32

81--84

EN

The basic parallel versions of the Danish Eulerian Model (UNI-DEM) has been implemented on the new petascale supercomputer DISCOVERER, installed last year in Sofia, Bulgaria by the company Atos. DISCOVERER is part of the European High Performance Computing Joint Undertaking (EuroHPC), which is building a network of 8 powerful supercomputers across the European Union (3 pre-exascale and 5 petascale).The results of some scalability experiments with the basic MPI and a hybrid MPI-OpenMP parallel implementations of UNI-DEM on the new Bulgarian petascale supercomputer DISCOVERER (in EuroHPC network) are presented here. They are compared with similar earlier experiments performed on the Mare Nostrum III supercomputer (petascale too) at Barcelona Supercomputing Centre - the most powerful supercomputer in Spain by that time, upgraded currently to the pre-exascale Mare Nostrum V, also part of the EuroHPC JU infrastructure.

2

Efficient simulations of large-scale convective heat transfer problems

Goik Damian, Banaś Krzysztof, Bielański Jan, Chłoń Kazimierz

Computer Science

|

2021

|

T. 22 (4)

517--538

EN

We describe an approach for efficient solution of large-scale convective heat transfer problems that are formulated as coupled unsteady heat conduction and incompressible fluid-flow equations. The original problem is discretized over time using classical implicit methods, while stabilized finite elements are used for space discretization. The algorithm employed for the discretization of the fluid-flow problem uses Picard’s iterations to solve the arising nonlinear equations. Both problems (the heat transfer and Navier–Stokes equations) give rise to large sparse systems of linear equations. The systems are solved by using an iterative GMRES solver with suitable preconditioning. For the incompressible flow equations, we employ a special preconditioner that is based on an algebraic multigrid (AMG) technique. This paper presents algorithmic and implementation details of the solution procedure, which is suitably tuned – especially for ill-conditioned systems that arise from discretizations of incompressible Navier–Stokes equations. We describe a parallel implementation of the solver using MPI and elements from the PETSC library. The scalability of the solver is favorably compared with other methods, such as direct solvers and the standard GMRES method with ILU preconditioning.

3

Multi-core and many-core SPMD parallel algorithms for construction of basins of attraction

Silveira Marcos, Gonçalves Paulo J.P., Balthazar José M.

Journal of Theoretical and Applied Mechanics

|

2019

|

Vol. 57 nr 4

1067--1079

EN

Construction of basins of attraction, used for the analysis of nonlinear dynamical systems which present multistability, are computationaly very expensive. Because of the long runtime needed, in many cases, the construction of basins does not have any practical use. Numerical time integration is currently the bottleneck of algorithms used for the construction of such basins. The integrations related to each set of initial conditions are independent of each other. The assignment of each integration to a separate thread seems very attractive, and parallel algorithms which use this approach to construct the basins are presented here. Two versions are considered, one for multi-core and another for many-core architectures, both based on a SPMD approach. The algorithm is tested on three systems, the classic nonlinear Duffing system, a non-ideal system exhibiting the Sommerfeld effect and an immunodynamic system. The results for all examples demonstrate the versatility of the proposed parallel algorithm, showing that the multi-core parallel algorithm using MPI has nearly an ideal speedup and efficiency.

4

Hybrid MPI/Open-MP acceleration approach for high-order schemes for CFD

Saczek Michał, Wawrzak Karol, Tyliszczak Artur, Boguslawski Andrzej

TASK Quarterly : scientific bulletin of Academic Computer Centre in Gdansk

|

2018

|

Vol. 22, No 3

179--193

EN

The paper presents a hybridMPI+OpenMP (Message Passing Interface/Open Multi-Processor) algorithm used for parallel programs based on the high-order compact method.The main tools used to implement parallelism in computations are OpenMP andMPI whichdiffer in terms of memory on which they are based. OpenMP works on shared-memory and the MPIon distributed-memory whereas the hybrid model is based on a combination of those methods. The tests performed and described in this paper present significant advantages provided by a combination of the MPI/OpenMP approach. The test computations needed for verifying possibilities ofMPI, Open-MP and Hybrid of both tools were carried out using anacademic high-order SAILORsolver. The obtained results seem to be very promising to accelerate simulations of fluid flows as well as for application using high order methods.

5

Using GPU Accelerators for Parallel Simulations in Material Physics

Uchroński M., Potasz P., Szymańska-Kwiecień A., Hruszowiec M.

Computational Methods in Science and Technology

|

2018

|

Vol. 24, No. 4

249--258

EN

This work is focused on parallel simulation of electron-electron interactions in materials with non-trivial topological order (i.e. Chern insulators). The problem of electron-electron interaction systems can be solved by diagonalizing a many-body Hamiltonian matrix in a basis of configurations of electrons distributed among possible single particle energy levels – the configuration interaction method. The number of possible configurations exponentially increases with the number of electrons and energy levels; 12 electrons occupying 24 energy levels corresponds to the dimension of Hilbert space about 106 . Solving such a problem requires effective computational methods and highly efficient optimization of the source code. The work is focused on many-body effects related to strongly interacting electrons on flat bands with non-trivial topology. Such systems are expected to be useful in study and understanding of new topological phases of matter, and in further future they can be used to design novel nanomaterials. Heterogeneous architecture based on GPU accelerators and MPI nodes will be used for improving performance and scalability in parallel solving problem of electron-electron interaction systems

6

Nanoparticle emissions from gasoline vehicles DI & MPI

Czerwinski J., Comte P., Güdel M., Bonsack P.

Combustion Engines

|

2017

|

R. 56, nr 3

179--187

EN

The nanoparticles (NP) count concentrations are limited in EU for all Diesel passenger cars since 2013 and for gasoline cars with direct injection (GDI) since 2014. For the particle number (PN) of MPI gasoline cars there are still no legal limitations. In the present paper some results of investigations of nanoparticles from five DI and four MPI gasoline cars are represented. The measurements were performed at vehicle tailpipe and in CVS-tunnel. Moreover, five variants of “vehicle – GPF” were investigated. The PN-emission level of the investigated GDI cars in WLTC without GPF is in the same range of magnitude very near to the actual limit value of 6.0 × 1012 1/km. With the GPF’s with better filtration quality, it is possible to lower the emissions below the future limit value of 6.0 × 1011 1/km. The modern MPI vehicles also emit a considerable amount of PN, which in some cases can attain the level of Diesel exhaust gas without DPF and can pass over the actual limit value for GDI (6.0 × 1012 1/km). The GPF-technology offers in this respect further potentials to reduce the PN-emissions of traffic.

7

Highly Scalable Quantum Transfer Matrix Simulations of Molecule-Based Nanomagnets on a Parallel IBM BlueGene/P Architecture

Antkowiak M., Kucharski Ł., Matysiak R., Kamieniarz G.

Computational Methods in Science and Technology

|

2016

|

Vol. 22, No. 2

87--93

EN

In this work we present a very efficient scaling of our two applications based on the quantum transfer matrix method which we exploited to simulate the thermodynamic properties of Cr9 and Mn6 molecules as examples of the uniform and non-uniform molecular nanomagnets. The test runs were conducted on the IBM BlueGene/P supercomputer JUGENE of the Tier-0 performance class installed in the Jülich Supercomputing Centre.

8

Zastosowanie klastrów komputerowych w celu zwiększenia efektywności obliczeniowej przedsiębiorstw prognozujących rozprzestrzenianie się zanieczyszczeń w atmosferze

Majer M., Biniasz D.

Logistyka

|

2015

|

nr 3

3020--3029, CD 1

PL

Artykuł zawiera analizę możliwości zwiększenia konkurencyjności i wydajności przedsiębiorstwa zajmującego się wykonywaniem analiz i obliczeń związanych z symulacjami rozprzestrzeniania się zanieczyszczeń w atmosferze poprzez zastosowanie komputerów klastrowych, jako narzędzia minimalizującego czas obliczeń. Autor opisuje również zagadnienia związane z podstawami modelowania systemów monitorujących rozprzestrzenianie się zanieczyszczeń w powietrzu atmosferycznym. Opisane zostały również podstawowe podziały zanieczyszczeń, typy źródeł zanieczyszczeń i rodzaje emiterów. W dalszej części artykułu opisano podstawowe typy opisywanych modeli, jak również możliwe ich implementacje programistyczne, a także sposoby optymalizacji i usprawnienia ich działania poprzez zastosowanie klastrów komputerowych podczas wykonywania obliczeń.

EN

This article contains information about modeling of pollutants dispersion systems in the atmosphere. It also describes main types of pollutants, emission points and analysis of typical atmosphere pollutant dispersion models. It shows a way of using computer cluster systems in modeling process and a proper programming libraries for parallel computing. It also contains information about computational efficiency’s increase possibilities and computer cluster systems using profits for computational industries.

9

Scalability tests of the direct numerical simulation solver UNS3

Szeliga W., Morzyński M., Stankiewicz W., Kotecki K.

Journal of Mechanical and Transport Engineering

|

2015

|

Vol. 67, No. 4

59--69

EN

In this paper analysis of scalability of the solver UNS3, dedicated to direct numerical simulation (DNS) of Navier-Stokes equations, is presented. Efficiency of parallel computations has been examined with the use of a PC cluster built by the Division of Virtual Engineering. Tests have been carried out on a different number of partitions, in the range of 1÷80. The test case was steady flow around a wall-mounted circular cylinder with Reynolds number set to the value of Re = 10. The research included the measurement of preparatory time, calculation time, communication time, speedup, core hours and efficiency.

PL

W niniejszym artykule zawarto analizę skalowalności solwera UNS3 służącego do obliczeń CFD (ang. computational fluid dynamics) typu DNS (ang. direct numerical simulation). Skuteczność wykorzystania wielowątkowości sprawdzano przy użyciu klastra Zakładu Inżynierii Wirtualnej. Badania prowadzono na procesorach typu Intel® CoreTM 2 Quad oraz Intel® Xeon® przy ilości partycji w zakresie 1÷80. Za testowe zadanie posłużyły obliczenia stacjonarne opływu cylindra o przekroju kołowym zamocowanego na ścianie, przy liczbie Reynoldsa Re = 10. Badano czas obliczeń, czas komunikacji międzywęzłowej, przyspieszenie w wyniku zrównoleglenia, zużycie zasobów oraz efektywność ich wykorzystania.

10

Analysis of parallelisation of 3D-CEMBS model using technologies like OpenACC and OpenMP

Piotrowski P.

Biuletyn Instytutu Morskiego w Gdańsku

|

2015

|

Vol. 30, No. 1

10--15

EN

Oceanographic models utilise parallel computing techniques to increase their performance. Computer hardware constantly evolves and software should follow to better utilise modern hardware potential. The number of CPU cores with access to shared memory increases with hardware evolution. To fully utilise the possibilities new hardware presents, parallelisation techniques employed in oceanographic models, which were designed with distributed memory systems in mind, have to be revised. This research focuses on analysing the 3D-CEMBS model to assess the feasibility of using OpenMP and OpenACC technologies to increase performance. This was done through static code analysis and profiling. The findings show that the main performance problems are attributed to task decomposition that was designed with distributed memory systems in mind. To fully utilise modern shared memory systems, other task decomposition strategies need to be employed. The presented 3D-CEMBS model analysis is a first stage in wider research of oceanographic models as a specific class of parallel applications. In the long term the research will result in proposing design patterns tailored for oceanographic models that would exploit their characteristics to achieve better hardware utilisation on evolving hardware architectures.

PL

Modele oceanograficzne wykorzystują przetwarzanie równoległe dla zwiększenia wydajności. Sprzęt komputerowy ciągle ewoluuje, więc oprogramowanie powinno zmieniać się razem z nim, aby w pełni wykorzystać potencjał współczesnego sprzętu. Wraz z rozwojem sprzętu komputerowego zwiększa się liczba rdzeni procesorów, które mają dostęp do pamięci współdzielonej. Aby w pełni wykorzystać możliwości nowego sprzętu, techniki zrównoleglania wykorzystywane w modelach oceanograficznych muszą zostać zrewidowane. Modele oceanograficzne były często projektowane z myślą o systemach z pamięcią rozproszoną. Niniejsze badania skupiają się na analizie modelu 3D-CEMBS pod kątem możliwości wykorzystania technologii OpenMP i OpenACC w celu podniesienia wydajności modelu. W tym celu została przeprowadzona statyczna analiza kodu modelu oraz profilowanie. Wyniki badań pokazują, że główny problem wydajnościowy modelu jest wynikiem zastosowania dekompozycji zadań przewidzianej dla systemów z pamięcią rozproszoną. Aby w pełni wykorzystać współczesne komputery z pamięcią współdzieloną należy wprowadzić inne strategie dekompozycji zadań.

11

How message passing interface (MPI) accelerates a coalescent-based whole genome simulator

Cyran K. A., Myszor D.

Studia Informatica

|

2014

|

Vol. 35, nr 4

59--72

PL

Symulacje komputerowe uważane są za jeden z filarów współczesnej nauki. W artykule opisano kolejny rodzaj optymalizacji programu GENOME: A rapid coalescent-based whole genome simulator, mającej na celu skrócenie czasu oczekiwania na wyniki. Modyfikacje bazują na zrównoleglaniu wykonywania procesów z wykorzystaniem technologii MPI oraz klastrów HPC. W celu przetestowania uzyskanego rozwiązania wykorzystano klaster HPC Ziemowit, będący na wyposażeniu Śląskiej Biofarmy. Wyniki wskazują, iż wprowadzone modyfikacje pozwalają na znaczne skrócenie czasu wykonywania aplikacji.

EN

Computer simulations are one of the pillars of contemporary science. In the current paper we present next type of improvements introduced into GENOME: A rapid coalescent-based whole genome simulator. The modifications are based on parallelization of processes with the use of MPI technology. The influence of introduced modification, has been tested on Ziemowit HPC cluster which is installed in Silesian Biofarma. Results point out that process of outcomes generation can be reduced significantly if proposed modifications are applied.

12

Turbine : A Distributed-memory Dataflow Engine for High Performance Many-task Applications

Wozniak J. M., Armstrong T. G., Maheshwari K., Lusk E. L., Katz D. S., Wilde M., Foster I. T.

Fundamenta Informaticae

|

2013

|

Vol. 128, nr 3

337--366

EN

Efficiently utilizing the rapidly increasing concurrency of multi-petaflop computing systems is a significant programming challenge. One approach is to structure applications with an upper layer of many loosely coupled coarse-grained tasks, each comprising a tightly-coupled parallel function or program. “Many-task” programming models such as functional parallel dataflow may be used at the upper layer to generate massive numbers of tasks, each of which generates significant tightly coupled parallelism at the lower level through multithreading, message passing, and/or partitioned global address spaces. At large scales, however, the management of task distribution, data dependencies, and intertask data movement is a significant performance challenge. In this work, we describe Turbine, a new highly scalable and distributed many-task dataflow engine. Turbine executes a generalized many-task intermediate representation with automated self-distribution and is scalable to multi-petaflop infrastructures. We present here the architecture of Turbine and its performance on highly concurrent systems.

13

Zastosowanie standardu MPI w systemie wielordzeniowym do analizy stanów nieustalonych obwodów elektrycznych

Forenc J.

Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska

|

2011

|

nr 4

29-32

PL

W pracy przedstawiono równoległą metodę analizy stanów nieustalonych obwodów elektrycznych zaimplementowaną w systemie wielordzeniowym. Do komunikacji pomiędzy działającymi procesami zastosowano standard przesyłania komunikatów MPI. W praktycznym przykładzie analizy stanu nieustalonego otrzymano dobrą dokładność obliczeń oraz skrócenie ich czasu w porównaniu z obliczeniami sekwencyjnymi.

EN

In this paper the parallel method for transient analysis of electrical circuits, implemented in the multi-core system, is presented. Communication among running processes was carried out with the use of the MPI standard. In the practical example of the transient analysis, a good accuracy of results and shortening computations time comparing the sequential method, were obtained.

14

Parallel QBF Solving with Advanced Knowledge Sharing

Lewis M., Schubert T., Becker B., Marin P., Narizzano M., Giunchiglia E.

Fundamenta Informaticae

|

2011

|

Vol. 107, nr 2/3

139-166

EN

In this paper we present the parallel QBF Solver PaQuBE. This new solver leverages the additional computational power that can be exploited from modern computer architectures, from pervasive multi-core boxes to clusters and grids, to solve more relevant instances faster than previous generation solvers. Furthermore, PaQuBE’s progressive MPI based parallel framework is the first to support advanced knowledge sharing in which solution cubes as well as conflict clauses can be exchanged between solvers. Knowledge sharing plays a critical role in the performance of PaQuBE. However, due to the overhead associated with sending and receiving MPI messages, and the restricted communication/network bandwidth available between solvers, it is essential to optimize not only what information is shared, but the way in which it is shared. In this context, we compare multiple conflict clause and solution cube sharing strategies, and finally show that an adaptive method provides the best overall results.

15

Parallel Large Scale Simulations in the PL-Grid Environment

Kurowski K., Piontek T., Kopta P., Mamoński M., Bosak B.

Computational Methods in Science and Technology

|

2010

|

Vol. spec. iss. (1)

47-56

EN

The growing demand for computational power causes that Grids are becoming mission-critical components in research and industry, offering sophisticated solutions in leveraging large-scale computing and storage resources. The nature a Grid in which resources are usually shared among multiple organizations offering resources under their control based on the “best effort” approach with no guarantee concerning the quality-of-service may be inadequate to support large-scale simulations. Requirements of such simulations often exceed capabilities of a single computing center causing the need to simultaneously allocate and synchronize resources belonging to many administrative domains whose functionality is missing in leading grid middlewares preventing researchers from executing large-scale simulations in grids. The paper presents tools and services that were designed to build multilayered infrastructure capable of dealing with computationally intensive large-scale simulations in the grid environment. The developed and deployed middleware enables computing clusters in different administrative domains to be virtually welded into a single powerful compute resource that can be treated as a quasi-opportunistic supercomputer. We describe the middleware developed in the QosCosGrid project and being enhanced under the PL-Grid national grid initiative, which provides advance reservation and resource co-allocation functionality as well as support for parallel large-scale applications based on OpenMPI (for C/C++ and Fortran) or ProActive for Java.

16

Porting the OPATM-BFM Application to a Grid e-Infrastructure – Optimization of Communication and I/O Patterns

Cheptsov A., Dichev K., Keller R., Lazzari P., Salon S.

Computational Methods in Science and Technology

|

2009

|

Vol. 15, nr 1

9-19

EN

OPATM-BFM is an off-line three-dimensional coupled eco-hydrodynamic simulation model used for biogeochemical and ecosystem-level predictions. This paper presents the first results of research activities devoted to the adaptation of the parallel OPATMBFM application for an efficient usage in modern Grid-based e-Infrastructures. For the application performance on standard Grid architectures providing generic clusters of workstations, such results are important. We propose a message-passing analysis technique for communication-intensive parallel applications based on a preliminary application run analysis. This technique was successfully used for the OPATM-BFM application and allowed us to identify several optimization proposals for the current realization of the communication pattern. As the suggested improvements are quite generic, they can be potentially useful for other parallel scientific applications.

17

Klastry obliczeniowe. Poznawanie i wykorzystywanie za pośrednictwem sieci Internet. Środowisko MPI

Majer M.

Zeszyty Naukowe. Elektryka / Politechnika Opolska

|

2007

|

Vol. 320, z. 58

137-144

PL

W dobie rozwoju technologicznego stajemy przed coraz trudniejszymi zadaniami obliczeniowymi. Alternatywą dla pogoni za mocą obliczeniową komputera jest wykorzystanie możliwości jakie oferuje system klastrowy. W artykule omówiono możliwości dotarcia do komputera klastrowego na przykładzie Politechniki Opolskiej. Przybliżono sposób nawiązywania połączenia oraz niezbędne wymagania. Opisano również bibliotekę równoległego programowania MPI. Artykuł zawiera również analizę wydajności przykładowego algorytmu algebry liniowej w systemie klastrowym.

EN

In a high technology age we live, single PC computers are often not enough for comlitated computing problems. This article is about methods of using computer cluster systems with a proper programing libraries, for example MPI library. It also contains an example analysis of linear algebra algorithm which was tested on a cluster computer system placed at Technical University of Opole.

18

An Ultrahigh Performance MPI Implementation on SGI® ccNUMA Altix® Systems

Feind K., McMahon K.

Computational Methods in Science and Technology

|

2006

|

Vol. spec. issue

67-70

EN

The SGI® Message Passing Toolkit (MPT) software has implemented algorithms that provide extremely high-performance message passing on SGI Altix systems based on the SGI NUMAlink™ interconnect technology. Using Linux® OS infrastructure and SGI XPMEM cross-host memory-mapping software, SGI MPI delivers extremely high MPI performance on shared-memory single host/SMP Altix systems as well as multihost superclusters. This paper outlines the Altix hardware features, OS features, and library software algorithms that have been developed to provide the low-latency and high-bandwidth capabilities. We present high-performance features like direct copy send/receive, collectives, and the ultralow-latency SHMEM™ data transfer library. We include MPI benchmark results, including an MPI ping pong latency that ranges from 1.2 to 2.3 microseconds on a 512-CPU Altix system with 1.5 GHz Intel® Itanium® 2 Processors.

19

A Duality between Forward and Adjoint MPI Communication Routines

Cheng B.

Computational Methods in Science and Technology

|

2006

|

Vol. spec. issue

23-24

EN

In this article, we explore a natural duality that exist between MPI communication routines in parallel programs, and show the ease of its adjoint implementation via pointers.

20

Porównanie efektywności zadań komunikacyjnych realizowanych w klastrze obliczeniowym przy wykorzystaniu bibliotek MPI - MPICH i LAM

Pokuta W.

Zeszyty Naukowe. Informatyka / Politechnika Opolska

|

2005

|

Vol. 302, z. 2

193-203

PL

Artykuł opisuje wyniki badań wykonanych na klastrze firmy INTEL porównujących oprogramowania klastrowe do obliczeń równoległych. Opisywane są więc możliwości transmisyjne pomiędzy węzłami tzn.: zależność czasu przesyłania danych w klastrze od liczby procesów uruchamianych na węzłach, liczby jednocześnie wykonywanych transmisji i kierunku przesyłania danych. Badana jest też wydajność obliczeń uruchamianych na jednym węźle w zależności od liczby procesów uruchamianych na tym węźle. Badania wykonywane są za pomocą programu ttcp analizującego przepustowość sieci oraz oprogramowania MPI (MPICH i LAM). Analizowana jest możliwość stworzenia funkcji rozgłaszającej (broadcast) o krótszym czasie wykonania niż dostarczony w oprogramowaniu MPICH i LAM.

EN

The paper describes results of research executed in INTEL cluster. The main goal was a comparison of cluster software used to parallel computing. Transmission capabilities among nodes are described, such as a dependence of data transmission time on a number of processes run at the nodes, a number of simultaneous transmissions and a direction of data trasmission. Furthermore, an efficiency of computations executed at one node depending on a number of processes run at that node is examined. The research were done using MPI software (MPICH and LAM) and also a ttcp program which analyses a network bandwidth. A possibility of creating a broadcast function with an execution time shorter than that supported by the MPICH and LAM software is analysed.