Ograniczanie wyników
Czasopisma help
Autorzy help
Lata help
Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników

Znaleziono wyników: 32

Liczba wyników na stronie
first rewind previous Strona / 2 next fast forward last
Wyniki wyszukiwania
Wyszukiwano:
w słowach kluczowych:  high performance computing
help Sortuj według:

help Ogranicz wyniki do:
first rewind previous Strona / 2 next fast forward last
EN
Application of advanced mesh based methods, including adaptive finite element method, is impossible without theoretical elaboration and practical realization of a model for organization and functionality of computational mesh. One of the most basic mesh functionality is storing and providing geometrical coordinates for vertices and other mesh entities. New algorithm for this task based on on-the-fly recreation of coordinates was developed. Conducted tests are proving that, for selected cases, it can be orders of magnitude faster than naive approach or other similar algorithms.
EN
This paper presents an innovative solution in the form of a virtual reality (VR) and high performance computing (HPC) system dedicated to aid designing rotary forming processes with laser beam reheating the material formed. The invented method allowing a virtual machine copy to be coupled with its actual counterpart and a computing engine utilizing GPU processors of graphic NVidia cards to accelerate computing are discussed. The completed experiments and simulations of the 316L stainless steel semi-product spinning process showed that the developed VR-HPC system solution allows the manufacturing process to be effectively engineered and controlled in industrial conditions.
EN
The problem of “reshaping” the fundamental education of navigators in the conditions of intensive development of modern computer mathematics, intelligent technologies and high-performance computing is considered. The main attention is paid to the formation of the information-educational environment that provides intellectual support for the trainee. Examples of the use of intelligent technologies that contribute to the organization of the learning process as a creative process of building knowledge are presented.
EN
The aim of this paper is to investigate dense linear algebra algorithms on shared memory multicore architectures. The design and implementation of a parallel tiled WZ factorization algorithm which can fully exploit such architectures are presented. Three parallel implementations of the algorithm are studied. The first one relies only on exploiting multithreaded BLAS (basic linear algebra subprograms) operations. The second implementation, except for BLAS operations, employs the OpenMP standard to use the loop-level parallelism. The third implementation, except for BLAS operations, employs the OpenMP task directive with the depend clause. We report the computational performance and the speedup of the parallel tiled WZ factorization algorithm on shared memory multicore architectures for dense square diagonally dominant matrices. Then we compare our parallel implementations with the respective LU factorization from a vendor implemented LAPACK library. We also analyze the numerical accuracy. Two of our implementations can be achieved with near maximal theoretical speedup implied by Amdahl’s law.
EN
The method for computing the latent heat in a system with many independently behaving components of the order parameter proposed previously is presented for a chosen point of the phase diagram of the 3D Ashkin-Teller (AH) model. Binder, Challa, and Lee-Kosterlitz cumulants are exploited and supplemented by the use of the energy distribution histogram. The proposed computer experiments using the Metropolis algorithm calculate the cumulants in question, the internal energy and its partial contributions as well as the energy distribution for the model Hamiltonian and its components. The important part of our paper is an attempt to validate the results obtained by several independent methods.
EN
In this paper after a short theoretical introduction about modern techniques used inparallel computing, we report a case study related to the design and development of the Caliban Linux High Performance Computing cluster, carried out by the author in the High Performance Computing Laboratory of the University of L’Aquila. Finally we report some performance evaluation tests related to the Caliban cluster performed using HPL (High-Performance Linpack) benchmarks.
EN
The article presents an integrated (inferential) system of computer assistance in waste management designed in componentbased technology. The system allows for the implementation of individual elements (system components) with native and managed programming languages and performance technologies, ensuring easy integration of those components into one coherent, cooperating whole. One of the key issues involves the placement of the objects, events and conducted spatial (geographical) analyses in the system through the application of GIS technology (ability to use digital (vector or halftone-based) terrain maps), execution of spatial analyses, data visualization on maps, etc., using also commonly available spatial data available as part of the Infrastructure for Spatial Information (established under the Act on Infrastructure for Spatial Information).
EN
In this communication we present a hardware-oriented algorithm for constant matrix-vector product calculating, when the all elements of vector and matrix are complex numbers. The main idea behind our algorithm is to combine the advantages of Winograd’s inner product formula with Gauss's trick for complex number multiplication. The proposed algorithm versus the naïve method of analogous calculations drastically reduces the number of multipliers required for FPGA implementation of complex-valued constant matrix-vector multiplication. If the fully parallel hardware implementation of naïve (schoolbook) method for complex-valued matrix-vector multiplication requires 4MN multipliers, 2M N-inputs adders and 2MN two-input adders, the proposed algorithm requires only 3N(M+1)/2 multipliers and [3M(N+2)+1,5N+2] two-input adders and 3(M+1) N/2-input adders.
PL
W komunikacie został zaprezentowany sprzętowo-zorientowany algorytm mnożenia macierzy stałych przez wektor zmiennych w założeniu, gdy zarówno elementy macierzy jak i elementy wektora są liczbami zespolonymi. Główna idea proponowanego algorytmu polega na łącznym zastosowaniu wzoru Winograda do wyznaczania iloczynu skalarnego oraz formuły Gaussa mnożenia liczb zespolonych. W porównaniu z tradycyjnym sposobem realizacji obliczeń proponowany algorytm pozwala zredukować liczbę układów mnożących niezbędnych do całkowicie równoległej realizacji na platformie FPGA układu wyznaczania iloczynu wektorowo-macierzowego. Jeśli całkowicie równoległa implementacja tradycyjnej metody wyznaczania omawianych iloczynów wymaga 4MN bloków mnożących, 2M N-wejściowych sumatorów oraz 2MN sumatorów dwuwejściowych, to proponowany algorytm wymaga tylko 3N(M+1)/2 błoków mnożenia, [3M(N+2)+1,5N+2] sumatorów dwuwejściowych i 3(M+1) sumatorów N/2-wejściowych.
9
Content available remote Efficient Simulation of Reaction Systems on Graphics Processing Units
EN
Reaction systems represent a theoretical framework based on the regulation mechanisms of facilitation and inhibition of biochemical reactions. The dynamic process defined by a reaction system is typically derived by hand, starting from the set of reactions and a given context sequence. However, this procedure may be error-prone and time-consuming, especially when the size of the reaction system increases. Here we present HERESY, a simulator of reaction systems accelerated on Graphics Processing Units (GPUs). HERESY is based on a fine-grained parallelization strategy, whereby all reactions are simultaneously executed on the GPU, therefore reducing the overall running time of the simulation. HERESY is particularly advantageous for the simulation of large-scale reaction systems, consisting of hundreds or thousands of reactions. By considering as test case some reaction systems with an increasing number of reactions and entities, as well as an increasing number of entities per reaction, we show that HERESY allows up to 29× speed-up with respect to a CPU-based simulator of reaction systems. Finally, we provide some directions for the optimization of HERESY, considering minimal reaction systems in normal form.
10
Content available Using Redis supported by NVRAM in HPC applications
EN
Nowadays, the efficiency of a storage systems is a bottleneck in many moern HPC clusters. High performance in traditional approach – processing using files – is often difficult to obtain because of model complexity and its read/write patterns. Alternative approach is applying a key-value database, which usually has low latency and scales well. On the other hand, many key-value stores suffer from limitation of memory capacity and vulnerability to serious faiures, which is caused by processing in RAM. Moreover, some research suggests, that scientific data models are not applicable to storage structures of key-value databases. In this paper, the author proposes resolving mentioned issues by replacing RAM with NVRAM. Practical example is based on Redis NoSQL. The article contains also a three domain specific APIs, that show the idea bhind transformation from HPC data model to Redis structures, as well as two micro-benchmarks results.
EN
In this paper we propose a new method of scheduling the distributed applications in cloud environment according to the High Performance Computing as a Service concept. We assume that applications, that are submitted for execution, are specified as task graphs. Our method dynamically schedules all the tasks using resource sharing by the applications. The goal of scheduling is to minimize the cost of resource hiring and the execution time of all incoming applications. Experimental results showed that our method gives significantly better utilization of computational resources than existing management methods for clouds.
12
Content available remote Finite element core calculations and stream processing
EN
We present the execution model and performance analysis for the important phase of finite element calculations, the creation of systems of linear equations. We assume that the process is realized using a set of CPU cores and GPU multiprocessors, with CPU and GPU memories connected using PCIe links for data transfer. We analyse the use of linear data structures that are designed specially for GPU processing. We present the examples of calculations for the standard first order FEM approximation and typical contemporary hardware. We draw the conclusions on the feasibility of the proposed approach.
PL
Artykuł prezentuje wydajnościowy model wykonania oraz analizę wydajności dla procedury tworzenia układu równań liniowych, która jest jedną z głównych faz obliczeń metodą elementów. Przyjęto założenia, że proces ten będzie wykonywany przez zbiór rdzeni CPU oraz zbiór mul t i procesorów GPU. Pamięć wykorzystywana przez CPU i GPU są połączone interfejsem PCIe poprzez który przeprowadzany jest transfer danych. Opracowany algorytm wykorzystuje liniową strukturę danych zaprojektowaną specjalnie pod kątem przetwarzania na procesorach GPU, dla którego została przeprowadzona analiza wykonania. W artykule przedstawione zostały wyniki uzyskane dla przykładów obliczeniowych, wykorzystujących liniową aproksymację MES na typowej współczesnej konfiguracji sprzętowej. Zakończenie zawiera wnioski dotyczące praktycznego znaczenia zastosowanego podejścia.
EN
The use of elastic bodies within a multibody simulation became more and more important within the last years. To include the elastic bodies, described as a finite element model in multibody simulations, the dimension of the system of ordinary differential equations must be reduced by projection. For this purpose, in this work, the modal reduction method, a component mode synthesis based method and a moment-matching method are used. Due to the always increasing size of the non-reduced systems, the calculation of the projection matrix leads to a large demand of computational resources and cannot be done on usual serial computers with available memory. In this paper, the model reduction software Morembs++ is presented using a parallelization concept based on the message passing interface to satisfy the need of memory and reduce the runtime of the model reduction process. Additionally, the behaviour of the Block-Krylov-Schur eigensolver, implemented in the Anasazi package of the Trilinos project, is analysed with regard to the choice of the size of the Krylov base, the blocksize and the number of blocks. Besides, an iterative solver is considered within the CMS-based method.
PL
W ostatnich latach w symulacji układów wieloczłonowych coraz ważniejsze staje się uwzględnianie odkształcalności członów. By w symulacji układu wieloczłonowego można było wykorzystać człony odkształcalne, modelowane metodą elementów skończonych, rozmiar układu równań różniczkowych zwyczajnych musi być zredukowany drogą projekcji.W tym celu w prezentowanej pracy zastosowano metodę redukcji modalnej, metodę opartą na syntezie składowych postaciowych (CMS) oraz metodę dopasowania momentów. Wobec wciąż rosnącego rozmiaru układów niezredukowanych, obliczanie macierzy projekcji prowadzi do wielkiego zapotrzebowania na moce obliczeniowe i nie może być wykonane na zwykłych, szeregowych komputerach. W pracy zaprezentowano oprogramowanie do redukcji modelu Morembs++, w którym wykorzystuje się obliczenia równoległe z interfejsem transmisji wiadomości (MPI), co zaspokaja zapotrzebowanie na pamięć i zmniejsza czas wykonania niezbędnych obliczeń. Ponadto działanie blokowego solvera wartości własnych Kryłowa-Schura, zaimplementowanego w pakiecie oprogramowania Anasazi z projektu Trilinos, zostało przeanalizowane pod kątem wyboru rozmiaru bazy Kryłowa, rozmiaru bloku i liczby bloków. Rozważono także użycie solvera iteracyjnego w ramach metody opartej na syntezie składowych postaciowych (CMS).
EN
The paper presents the BeesyCluster system as a middleware allowing invocation of services on high performance computing resources within the NIWA Centre of Competence project. Access is possible through both WWW and SOAP Web Service interfaces. The former allows non-experienced users to invoke both simple and complex services exposed through easyto-use servlets. The latter is meant for integration of external applications with services made available from clusters or servers. Details of services such as APIs used for development as MPI, OpenMP, OpenCL as well as queuing systems are hidden from the user. The paper describes both the WWW and Web Service interfaces extended for use with files of large sizes. Mechanisms for selection of devices for execution of services are described along with experiments including remote invocations.
PL
W pracy zaprezentowano wyniki symulacji numerycznej krzepnięcia. Prezentowany model obliczeniowy zbudowany jest na podstawie równania przewodzenia ciepła z członem źródłowym w jawnym sformułowaniu pojemnościowym. Model narastania fazy stałej jest zgodny z pośrednim modelem, w którym zakładana jest skończona dyfuzja domieszki w fazie ciekłej oraz brak dyfuzji domieszki w fazie stałej. Wynikające z modelu równania różniczkowe rozwiązane są za pomocą metody elementów skończonych. Obliczenia były realizowane na komputerach o pamięci rozproszonej. Wykonano testy dla siatek elementów skończonych składających się z maksymalnie 25 milionów elementów. Testy skalowalności były przeprowadzone dla maksymalnie 2048 procesorów.
EN
The results of numerical simulations are presented in this paper. Computational model presented in this work uses heat transfer equation with heat source term in explicit enthalpy formulation. An indirect model for a solid phase growth was used. The indirect model assumes finite diffusion of inclusion in liquid phase and no diffusion in solid state. Resulting differential equations are solved with use of Bubnov-Galerkin Finite Element Method (space discretization). The calculations were performed on computer cluster with distributed memory positioned 7th on TOP500 list from November 2014. We carried out tests using meshes of up to 25 million of tetrahedral elements. The scalability tests were conducted for up to 2048 processor cores.
EN
In this paper the additive algorithm of spectral analysis is considered. This algorithm consists of algebraic summation of samples of basis functions taken at certain points of an interval of an independent variable of a given function. Two variants of simulation of the additive algorithm are considered. In the first variant the process of receiving discrete values of continuous spectrum of a continuous function is considered. The second variant uses the additive algorithm for Discrete Cosine Transform (DCT), which is widely used in practice for converting graphic images. The conception of the accelerated calculation of the DCT is considered on examples of real two dimensional graphic images. The fragments of proposed programs for simulation of the additive algorithm for continuous signals and for image processing are represented in meta language.
EN
Over the decades the rapid development of broadly defined computer technologies, both software and hardware is observed. Unfortunately, software solutions are regularly behind in comparison to the hardware. On the other hand, the modern systems are characterized by a high demand for computing resources and the need for customization for the end users. As a result, the traditional way of system construction is too expensive, inflexible and it doesn’t have high resources utilization. Present article focuses on the problem of effective use of available physical and virtual resources based on the OpenStack cloud computing platform. A number of conducted experiments allowed to evaluate computing resources utility and to analyze performance depending on the allocated resources. Additionally, the paper includes structural and functional analysis of the OpenStack cloud platform.
EN
Parallelization of processing in Monte Carlo simulations of the Ising spin system with the lattice distributed in a stripe way is proposed. Message passing is applied and one-sided MPI communication with the MPI memory window is exploited. The 2D Ising spin lattice model is taken for testing purposes. The scalability of processing in our simulations is tested in real-life computing on high performance multicomputers and discussed on the basis of speedup and efficiency. The larger the lattice the better scalability is obtained.
EN
The coupled finite element multiscale simulations (FE2) require costly numerical procedures in both macro and micro scales. Attempts to improve numerical efficiency are focused mainly on two areas of development, i.e. parallelization/distribution of numerical procedures and simplification of virtual material representation. One of the representatives of both mentioned areas is the idea of Statistically Similar Representative Volume Element (SSRVE). It aims at the reduction of the number of finite elements in micro scale as well as at parallelization of the calculations in micro scale which can be performed without barriers. The simplification of computational domain is realized by transformation of sophisticated images of material microstructure into artificially created simple objects being characterized by similar features as their original equivalents. In existing solutions for two-phase steels SSRVE is created on the basis of the analysis of shape coefficients of hard phase in real microstructure and searching for a representative simple structure with similar shape coefficients. Optimization techniques were used to solve this task. In the present paper local strains and stresses are added to the cost function in optimization. Various forms of the objective function composed of different elements were investigated and used in the optimization procedure for the creation of the final SSRVE. The results are compared as far as the efficiency of the procedure and uniqueness of the solution are considered. The best objective function composed of shape coefficients, as well as of strains and stresses, was proposed. Examples of SSRVEs determined for the investigated two-phase steel using that objective function are demonstrated in the paper. Each step of SSRVE creation is investigated from computational efficiency point of view. The proposition of implementation of the whole computational procedure on modern High Performance Computing (HPC) infrastructures is described. It includes software architecture of the solution as well as presentation of the middleware applied for data farming purposes.
PL
Symulacje wieloskalowe z wykorzystaniem sprzężonej metody elementów skończonych wymagają kosztownych numerycznie procedur zarówno w skali makro jak i mikro. Próby poprawy efektywności numerycznej skupione są przede wszystkim na dwóch obszarach rozwoju tj. zrównoleglenie/rozproszenie procedur numerycznych oraz uproszczenie wirtualnej reprezentacji materiału. Jedną z metod reprezentującą obydwa obszary jest podejście Statystycznie Podobnego Reprezentatywnego Elementu Objętościowego. Głównym celem tej metody jest redukcja ilości elementów dyskretyzujących przestrzeń obliczeniową, ale również możliwość zrównoleglenia obliczeń w skali mikro, które mogą być realizowane niezależnie od siebie. Uproszczenie domeny obliczeniowej poprzez tworzenie elementu SSRVE realizowane jest za pomocą metod optymalizacji umożliwiających tworzenie elementu najbardziej podobnego do rzeczywistego materiału na podstawie wybranych cech charakterystycznych. W rozwiązaniu dla stali dwufazowych cechy opisujące podobieństwo są tworzone na podstawie analizy współczynników kształtu ziaren martenzytu na zdjęciu rzeczywistej mikrostruktury. Natomiast podejście przedstawione w niniejszym artykule zostało rozbudowane dodatkowo o lokalne wartości naprężeń i odkształceń tak, aby w pełni odzwierciedlić podobieństwo zarówno wizualne jak i behawioralne. Różne formy funkcji celu zostały poddane analizie w procesie optymalizacji, a uzyskane wyniki zostały porównane pod względem jakości, a także efektywności i unikalności rozwiązania. Ostatecznie zaproponowana została najlepsza funkcja celu obejmująca współczynniki kształtu oraz wartości naprężeń i odkształceń. Przykłady SSRVE wyznaczone dla analizowanych stali dwufazowych zostały przedstawione w artykule. Natomiast każdy krok procedury tworzenia elementu SSRVE został poddany analizie wydajności obliczeniowe, na podstawie której zaproponowane zostało podejście wykorzystujące nowoczesne architektury sprzętowe wysokiej wydajności. Opis podejścia zawiera zarówno architekturę rozwiązania jak i prezentację oprogramowania warstwy pośredniczącej.
EN
Effective, simulation-based trajectory optimization algorithms adapted to heterogeneous computers are studied with reference to the problem taken from alpine ski racing (the presented solution is probably the most general one published so far). The key idea behind these algorithms is to use a grid-based discretization scheme to transform the continuous optimization problem into a search problem over a specially constructed finite graph, and then to apply dynamic programming to find an approximation of the global solution. In the analyzed example it is the minimum-time ski line, represented as a piecewise-linear function (a method of elimination of unfeasible solutions is proposed). Serial and parallel versions of the basic optimization algorithm are presented in detail (pseudo-code, time and memory complexity). Possible extensions of the basic algorithm are also described. The implementation of these algorithms is based on OpenCL. The included experimental results show that contemporary heterogeneous computers can be treated as μ-HPC platforms—they offer high performance (the best speedup was equal to 128) while remaining energy and cost efficient (which is crucial in embedded systems, e.g., trajectory planners of autonomous robots). The presented algorithms can be applied to many trajectory optimization problems, including those having a black-box represented performance measure.
first rewind previous Strona / 2 next fast forward last
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.