Wyniki wyszukiwania - BazTech

1

Tabular minimal redundant modular structures for fast and high-precision computations using general-purpose computers

Selianinau M.

Scientific Issues of Jan Długosz University in Częstochowa. Mathematics

|

2017

|

Vol. 22

129--139

EN

The present paper is a continuation of research in parallel information processing based on the tabular modular computing structures. We deal with the methodology of using a minimal redundant modular number system for high-speed and high-precision computation by means of modern universal multicore processors. Advantages of formal computing mode on the base of modular arithmetic are demonstrated by the example of implementation of digital signal processing procedures. The additive and additive multiplicative formal computing schemes with the obtained estimations of the cardinality of working ranges for the realization of calculations are presented in the article.

2

Komputerowe symulacje procesów fizycznych z zastosowaniem heterogenicznych układów wielordzeniowych

Michalski G.

Studia i Materiały / Europejska Uczelnia Informatyczno-Ekonomiczna w Warszawie

|

2016

|

Nr 1(11)

13--21

PL

Problemy, przed jakimi stają współcześni inżynierowie, wymagają bardzo często przeprowadzenia złożonych symulacji komputerowych rozważanego zjawiska. W zdecydowanej większości takich symulacji wyznaczane są rozkłady różnych wielkości fizycznych, takich jak temperatura, odkształcenia, czy przemieszczenia. Ze względu na dużą złożoność tego rodzaju zadań realizowanie ich na zwykłych procesorach ogólnego przeznaczenia staje się nieefektywne. Coraz częściej inżynierowie sięgają po nowoczesne heterogeniczne układy wielordzeniowe takie jak układy graficzne. Zastosowanie tych rozwiązań sprzętowych pozwala na znaczące przyspieszenie obliczeń. W pracy autor przedstawił komputerową symulację procesu krzepnięcia odlewu w formie odlewniczej z zastosowaniem układów graficznych nVidia zgodnych z architekturą CUDA.

EN

Issues today's faced by engineers require's very often perform complex computer simulations the considered phenomenon. In the great majority of these computer simulations are calculated distributions of various of physical quantities such as temperature, deformations, and displacements. Due to a large complexity of these tasks use the general purpose processors becomes ineffective. More often engineers are reach for the modern many-core heterogeneous systems such as GPUs. Use of these hardware solutions can significantly speed up the computations.In this work the author presents a computer simulation of casting solidification process in the mold using nVidia chipset compatible with the CUDA architecture.

3

Speed analyse of two step algorithms of trigonometric transformations on multi-core processors

Cegielski M.

Przegląd Elektrotechniczny

|

2012

|

R. 88, nr 3a

47-48

EN

The two-stage trigonometric transformations algorithms have full symmetry calculations for each stage of the algorithm. Such algorithm may be subjected to any decomposition allowing to split the process of the calculations into any number of processes, which can be implemented independently within one step of the algorithm. Additionally, a single step of algorithm may depend on the size of the data and the associated number of arithmetic operations, which implementation may depend on available hardware resources. In the article the results of the computations experiments for multi-core processors are presented and compared.

PL

Dwuetapowe algorytmy przekształceń trygonometrycznych posiadają pełną symetrię obliczeń dla poszczególnych bloków algorytmu. Algorytm taki może być poddany dowolnej dekompozycji pozwalając na rozdzielenie procesu obliczeń na dowolną liczbę procesów, które mogą być realizowane niezależnie w obrębie jednego kroku algorytmu. Dodatkowo pojedynczy krok algorytmu może być uzależniony od wielkości danych i związanych z nim liczby operacji arytmetycznych, których realizacja może być uzależniona od dostępnych zasobów sprzętowych. W artykule zaprezentowano i porównano wyniki szybkości algorytmu otrzymane dla procesorów wielordzeniowych.

4

ab-Stream - A Framework for programming Many-core

Gan X., Wang Z., Shen L., Zhu Q.

Przegląd Elektrotechniczny

|

2012

|

R. 88, nr 7b

341--344

EN

The common approach to program many-core processor is to write processor-specific code with low level APIs for different processors, which could achieve good performance but would result in serious portability issues: programmers are required to write a specific version code for target architecture. Therefore, we present ab-Stream, an extensible framework for programming many-threaded processor based on SUIF Intermediate Representation. ab-Stream abstracts many-core many-threaded processor into a unified architecture and ab-Stream program is an OpenMP-like program with different directives for many-core processor. Furthermore, a prototype of ab-Stream was implemented to map ab-Stream programs into many-core GPU. Experiments show that our implementation can execute transformed code correctly and efficiently on CUDA-enabled GPUs. Furthermore, performance of ab-Stream version code produced by our prototype can outperform original GPU code and is close to handoptimized GPU code.

PL

Zaprezentowano szkielet (framework) ab-Stream do programowania wielordzeniowych procesorów. System bazuje na formacie SUIF (Standford University Intermediate Format).

5

Równoległe implementacje algorytmu Gaussa-Seidela w środowisku OpenMP

Sadecki J.

Pomiary Automatyka Kontrola

|

2011

|

R. 57, nr 3

301-304

PL

W artykule przedstawiono przykładowe rezultaty analizy efektywności równoległych realizacji algorytmu Gaussa-Seidela zaimplementowanych w środowisku procesorów wielordzeniowych. Jak pokazano, standardowa równoległa implementacja tego algorytmu, prowadzi do gorszych w sensie szybkości zbieżności wyników w porównaniu do sekwencyjnej wersji tej metody. Zaproponowana nowa wersja równoległa metody Gaussa-Seidela posiada analogiczną szybkość zbieżności jak jej realizacja sekwencyjna, zachowując przy ty łatwość implementacji równoległej. W artykule przedstawiono przykładowe rezultaty obliczeń przeprowadzonych przy wykorzystaniu procesora czterordzeniowego. Rozważana implementacja algorytmu Gaussa-Seidela posiada też możliwości jej zastosowania dla szerszej niż rozważana w pracy klasy problemów optymalizacji.

EN

The paper presents results of the efficiency analysis for some parallel realization of optimisation algorithms in multicore processors. The results concern a simple Gauss-Seidel optimization algorithm. In the paper both standard parallel and new parallel implementations of the Gauss-Seidel algorithm are presented. As it is pointed out, the standard parallel algorithm leads to worse numerical results (in terms of the rate of computation convergence) than the sequential version of this algorithm. The new parallel algorithm achieves the same numerical ef?ciency of computations as the sequential algorithm and, additionally, can be aesily implemented in multicore processors. It is prooved that, for the quadratic optimization problem, the modified parallel Gauss-Seidel algorithm leads to the same computational results as for the sequential implementation of the method. Some examples of parallel implementations of the method in fourcore processors are presented. The proposed new algorithm enables achieving good efficiency of parallel computations both in terms of the execution time and the speedup factor value. The new algorithm can also be used to solve broader classes of optimization problems, which in the nearest neighbourhood of the optimal solution can be sufficiently precisely approximated by the square function.

6

Automatyczne zrównoleglanie kodu aplikacji systemów wbudowanych

Pałkowski M.

Pomiary Automatyka Kontrola

|

2010

|

R. 56, nr 7

656-658

PL

W artykule przedstawiono technikę automatycznego zrównoleglenia kodu aplikacji w celu efektywnego wykorzystania mocy obliczeniowej procesorów wielordzeniowych w systemach wbudowanych. Technika ta opiera się na analizie zależności danych w pętlach programowych, podziału ich przestrzeni iteracji i wyznaczeniu niezależnych fragmentów kodu. Rezultatem transformacji jest równoległy kod zgodny ze standardem OpenMP, tożsamy z jego sekwencyjnym odpowiednikiem oraz możliwość przyspieszenia obliczeń komputera przemysłowego.

EN

In a fairly conservative group of solutions, such as industrial computers, more perfect miniaturization of processing units is becoming noticeable. Size and power consumption of units are important, however efficiency of processing is also significant. Installing multi-core processors in embedded systems allows executing the parallel code with OpenMP standard. Multi-core programming enables speeding up calculations, i.e. for test and measurement-processing systems the amount of measurement data processed is increased. For this purpose, techniques of transforming program code to a parallel form are necessary, in particular loop parallelization transformations are significant, because the vast majority of calculations is included in loops. There are many techniques for loop prallelization, such as unimodular and affine transformations. However, these techniques allow only extraction of parallelism for specified set of loops and fail to find full parallelism in a loop because of high inability. In this paper, the Iteration Space Slicing Framework is presented. The framework was designed for automatic extracting parallelism in loops and overcoming limitations of well-known techniques. The result of transformation is the parallel code including OpenMP pragmas. The speedup, efficiency and locality of the code is examined. The continuation of the work in the future is considered.