Wyniki wyszukiwania - BazTech

1

Adding parallelism to sequential programs : a combined method

Daszczuk Wiktor B., Czejdo Danny B., Grześkowiak Wojciech

International Journal of Electronics and Telecommunications

|

2024

|

Vol. 70, No. 1

135--144

EN

The article outlines a contemporary method for creating software for multi-processor computers. It describes the identification of parallelizable sequential code structures. Three structures were found and then carefully examined. The algorithms used to determine whether or not certain parts of code may be parallelized result from static analysis. The techniques demonstrate how, if possible, existing sequential structures might be transformed into parallel-running programs. A dynamic evaluation is also a part of our process, and it can be used to assess the efficiency of the parallel programs that are developed. As a tool for sequential programs, the algorithms have been implemented in C#. All proposed methods were discussed using a common benchmark.

2

Development and verification of a high-precision laser measurement system for straightness and parallelism measurement

Xu Peng, Li Rui Jun, Zhao Wen Kai, Chang Zhen Xin, Ma Shao Hua, Fan Kuang Chao

Metrology and Measurement Systems

|

2021

|

Vol. 28, nr 3

479--495

EN

A laser measurement system for measuring straightness and parallelism error using a semiconductor laser was proposed. The designing principle of the developed system was analyzed. Addressing at the question of the divergence angle of the semiconductor laser being quite large and the reduction of measurement accuracy caused by the diffraction effect of the light spot at the long working distance, the optical structure of the system was optimized through a series of simulations and experiments. A plano-convex lens was used to collimate the laser beam and concentrate the energy distribution of the diffraction effect. The working distance of the system was increased from 2.6 m to 4.6 m after the optical optimization, and the repeatability of the displacement measurement is kept within 2.2 m in the total measurement range. The performance of the developed system was verified by measuring the straightness of a machine tool through the comparison tests with two commercial multi-degree-of-freedom measurement systems. Two different measurement methods were used to verify the measurement accuracy. The comparison results show that during the straightness measurement of the machine tool, the laser head should be fixed in front of the moving axis, and the sensing part should move with the moving table of the machine tool. Results also show that the measurement error of the straightness measurement is less than 3 m compared with the commercial systems. The developed laser measurement system has the advantages of high precision, long working distance, low cost, and suitability for straightness and parallelism error measurement.

3

Algorytmy i pomiary odchyłek równoległości i prostopadłości płaszczyzn i prostych w przestrzeni R3 na współrzędnościowej maszynie pomiarowej

Filipowski R., Lechniak Z., Zawora J.

Obróbka Metalu

|

2018

|

nr 1

32--41

PL

Przedstawiono algorytmy obliczania odchyłek w mm/m równoległości i prostopadłości płaszczyzn i prostych na współrzędnościowej maszynie pomiarowej (WMP). Obliczane odchyłki muszą mieścić się w polu tolerancji równoległości lub prostopadłości zadawanych przez konstruktora na powierzchniach części maszyn. Tolerancje równoległości i prostopadłości oznaczone są na rysunkach w ramce prostokątnej i zawierają symbol równoległości lub prostopadłości oraz graniczną wartość odchyłki w mm względem jednej bazy lub dwóch baz. W artykule zamieszczono wartości odchyłek równoległości i prostopadłości płaszczyzn i prostych w mm/m mierzonych na maszynie współrzędnościowej WMP sztywną głowicą pomiarową.

EN

Algorithms and measurements of deviations of parallelism and perpendicularity planes and straight lines in r3 space on a coordinate measuring machine The algorithms for calculating deviations in mm/m of parallelism and perpendicularity of planes and straight lines on a coordinate measuring machine (CMM) are presented. Calculated deviations must be within the tolerance of parallelism or rectangularity given by the constructor on the surfaces of machine parts. Tolerances of parallelism and rectangularness are marked on the drawings in a rectangular frame and contain the symbol of parallelism or perpendicularity and the limit value of deviation in mm relative to one base or two bases. In the paper, deviations of parallelism and perpendicularity of planes and straight lines in mm/m measured on the CMM coordinate machine with rigid measuring head were placed.

4

On the maximal dimensionality of tiles in tiled code generated by means of Affine Transformations

Bielecki W., Pałkowski M.

Przegląd Elektrotechniczny

|

2015

|

R. 91, nr 11

158-161

EN

Tiling(blocking) is a very important iteration reordering transformation for both improving data locality and extracting loop nest parallelism. Affine transformations are one of the most power approach to generate tiled code. Tile dimensionality has a strong impact on tiled code performance. This paper presents a way allowing one to discover before tiling what is the maximal dimensionality of tiles in code generated by means of affine transformations.

XX

Blokowanie jest bardzo ważną transformacja reorganizacji iteracji zarówno dla poprawy lokalności pętli jak i dla ekstrakcji równoległości w gniezdzie pętli programowej. Przekształcenia afiniczne są jednym z najbardziej mocnych podejść do implementacji techniki blokowania. W artykule przedstawiono sposób, za pomocą którego można odkryć przed zastosowaniem blokowania jaki jest maksymalny wymiar bloków w kodzie generowanym za pomocą przekształceń afinicznych, który ma silny wpływ na wydajność kodu.

5

On the Complexity of Optimal Parallel Cooperative Path-Finding

Surynek P.

Fundamenta Informaticae

|

2015

|

Vol. 137, nr 4

517--548

EN

A parallel version of the problem of cooperative path-finding (pCPF) is introduced in this paper. The task in CPF is to determine a spatio-temporal plan for each member of a group of agents. Each agent is given its initial location in the environment and its task is to reach the given goal location. Agents must avoid obstacles and must not collide with one another. The environment where agents are moving is modeled as an undirected graph. Agents are placed in vertices and they move along edges. At most one agent is placed in each vertex and at least one vertex remains unoccupied. An agent can only move into a currently unoccupied vertex in the standard version of CPF. In the parallel version, an agent can also move into a vertex being currently vacated by another agent supposing the character of this movement is not cyclic. The optimal pCPF where the task is to find the smallest possible solution of the makespan is particularly studied. The main contribution of this paper is the proof of NP-completeness of the decision version of the optimal pCPF. A reduction of propositional satisfiability (SAT) to the problem is used in the proof.

6

Effective expectation maximization algorithm implementation using multicore computer systems

Kasitskij A., Bidyuk P., Gozhyi A.

Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska

|

2014

|

nr 4

35--37

EN

A popular expectation maximization algorithm that is widely used in modern data processing systems to solve various problems including optimization and parameter estimation is considered in the paper. The task of the study was to enhance effectiveness of the algorithm execution in time. An enhancement of execution rate for the EM algorithm using multicore architecture of modern computer systems was carried out. Necessary modifications aimed at better parallelism were proposed for implementation of the EM algorithm. An efficiency of the software implementation was tested on the classic problem of Gaussian random variables mixture separation. It is shown that in the mixture separation problem EM algorithm performance degrades when the distance between mean values of distributions is less than three standard deviations, which is totally in the spirit of three sigma law. In such cases, it is very important to have an efficient EM algorithm implementation to be able to process such test cases in a reasonable time.

PL

W artykule opisany jest popularny algorytm EM (expectation maximization), który jest powszechnie stosowany w nowoczesnych systemach przetwarzania danych do rozwiązywania różnych problemów, w tym optymalizacji i estymacji parametrów. Celem badań było zwiększenie efektywności czasu wykonywania algorytmu. Zwiększenie szybkości wykonania algorytmu EM użyto wielordzeniowy architektury nowoczesnych systemów komputerowych. Zostały zaproponowane niezbędne modyfikacje mające na celu lepszą równoległość realizacji algorytmu EM. Skuteczność implementacji programu była testowana na klasycznym problemie separacji Gaussowskich zmiennych losowych. Wykazano, że w przypadku rozdziału mieszaniny wydajność algorytmu EM ulega degradacji, kiedy odległość między średnimi wartościami rozkładu wynosi mniej niż trzy odchylenia standardowe, co jest całkowicie zgodnie z regułą trzech sigm. W takich przypadkach, jest bardzo ważne, aby mieć efektywną realizację algorytmu EM móc przetworzyć takie przypadki w rozsądnym czasie.

7

Substructural Meta-Theory of a Type-Safe Language forWeb Programming

Cervesato I., Sans T.

Fundamenta Informaticae

|

2014

|

Vol. 130, nr 1

67--97

EN

This paper introduces an abstract web programming language, QWeSST, and a methodology for proving properties of formalisms, such are QWeSST, that are parallel, distributed and concurrent. At its core, QWeSST is a small functional programming language extended with primitives for mobile code and remote procedure calls, two distinguishing features of web programming. It supports a localized view of typechecking and of evaluation, which reflects the way we program web applications and web services. We have developed a prototype implementation for QWeSST and used it to elegantly write simple web applications that are however not easily expressed using current web technology. We give two semantics for QWeSST, one is standard and models a naive form of single-threaded evaluation, the other is maximally parallel and exploits a presentation of its typing and execution behaviors based on an extended form of substructural operational semantics. It augments standard inference rules with a construction that realizes parametric multiset comprehension, which makes it convenient to capture ensemble-level behaviors. We prove that both semantics are type safe, the former using traditional methods, the latter by developing a proof methodology that parallels the multiset-oriented presentation of the semantics.

8

Modular fem framework “ModFEM” for generic scientific parallel simulations

Michalik K., Banaś K., Płaszewski P., Cybułka P.

Computer Science

|

2013

|

Vol. 14 (3)

513--528

EN

We present the design for, and implementation of, aflexible and robust parallel modular finite element (FEM) framework called ModFEM. The designis based on reusable modules which use narrow and well-defined interfaces to cooperate. At the top of the architecture, there are problem - dependent modules. Problem - - dependent modules can be additionally grouped together by“super-modules”. The structure allows for reusing the sequential code for parallel environments, and also supports solving multi-physics and multi-scale problems.

9

ModFem : a computational framework for parallel adaptive finite element simulations

Michalik K., Banaś K., Płaszewski P., Cybułka P.

Computer Methods in Materials Science

|

2013

|

Vol. 13, No. 1

3--8

EN

We present the design and its' implementation for a flexible and robust modular finite element framework, called ModFem. The design is based on reusable modules which use narrow and well-defined interfaces to cooperate. At the top of the architecture there are problem dependent modules, with the main module being an incompressible flow solver. Problem dependent modules can be additionally grouped together by "super-modules", e.g. for the purpose of applying created codes for multi-physics and multi-scale problems. Additionally, the framework tries to provide suitable infrastructure for parallel computations, at the level of shared memory, as well as distributed memory systems.

PL

Autorzy prezentują koncepcję i implementację szkieletu obliczeniowego do równoległych adaptacyjnych symulacji, metodą elementów skończonych (MES), o nazwie ModFem. Głównym założeniem projektowym był podział całego szkieletu na moduły, połączone poprzez precyzyjnie zdefiniowane wąskie interfejsy. Na szczycie architektury modularnej znajduję się moduły odpowiedzialne za modelowanie konkretnych zjawisk fizycznych, w szeczególności przepływów nieściśliwych. Ponadto moduły być łączone z innymi modułami problemowymi w super-moduły, min.: w celu użycia istniejących rozwiązań podczas modelowania problemów ze sprzężeniem wielu różnych zjawisk fizycznych oraz modelowania wieloskalowego. Dodatkowo, szkielet wspiera wykorzystywanie równoległości zarówno na poziomie pamięci współdzielonej jak i rozproszonej.

10

Akceleracja obliczeń komputerowych za pomocą układów graficznych z wykorzystaniem technologii CUDA

Stefanowicz Ł., Wiśniewski R., Wiśniewska M.

Pomiary Automatyka Kontrola

|

2011

|

R. 57, nr 8

954-956

PL

W artykule zaprezentowano możliwość zastosowania układów graficznych celem przyspieszenia obliczeń komputerowych. Przedstawiono technologię oraz architekturę CUDA firmy nVidia, a także podstawowe rozszerzenia względem standardów języka C. W referacie omówiono autorskie algorytmy testowe oraz metodykę badań, które przeprowadzono w celu określenia skuteczności akceleracji obliczeń komputerowych z wykorzystaniem procesorów graficznych GPU w porównaniu do rozwiązań tradycyjnych, opartych o CPU.

EN

The paper deals with application of the graphic processor units (GPUs) to acceleration of computer operations and computations. The traditional computation methods are based on the Central Processor Unit (CPU), which ought to handle all computer operations and tasks. Such a solution is especially not effective in case of distributed systems where some sub-tasks can be performed in parallel. Many parallel threads can accelerate computing, which results in a shorter execution time. In the paper a new CUDA technology and architecture is shown. The presented idea of CUDA technology bases on application of the GPU processors to compu-tation to achieve better performance in comparison with the traditional methods, where CPUs are used. The GPU processors may perform multi-thread calculation. Therefore, especially in case of tasks where concurrency can be applied, CUDA may highly speed-up the computation process. The effectiveness of CUDA technology was verified experimentally. To perform investigations and experiments, the own test modules were used. The library of benchmarks consists of various algorithms, from simple iteration scripts to video processing methods. The results obtained from calculations performed via CPU and via GPU are compared and discussed.

11

EASEA : a generic optimization tool for GPU machines in asynchronous island model

Baumes L. A., Kruger F., Collet P.

Computer Methods in Materials Science

|

2011

|

Vol. 11, No. 3

489-499

EN

Very recently, we presented an efficient implementation of Evolutionary Algorithms (EAs) using Graphics Processing Units (GPU) for solving microporous crystal structures. Because of both the inherent complexity of zeolitic materials and the constant pressure to accelerate R&D solutions, an asynchronous island model running on clusters of machines equipped with GPU cards, i.e. the current trend for super-computers and cloud computing, is presented. This last improvement of the EASEA platform allows an effortless exploitation of hierarchical massively parallel systems. It is demonstrated that supra-linear speedup over one machine and linear speedup considering clusters of different sizes are obtained. Such an island implementation over several potentially heterogeneous machines opens new horizon for various domains of application where computation time for optimization remains the principal bottleneck.

PL

W swojej poprzedniej pracy Autorzy przedstawili wydajną implementację Algorytmów Ewolucyjnych (ang. Evolutionary Algorithms - EA) z zastosowaniem procesorów graficznych (Graphics Processing Units GPU) do rozwiązywania struktur krystalicznych z mikroporami. Ze względu na skomplikowanie materiałów zeolitycznych oraz ciągłą presję na poprawę efektywności symulacji, w niniejszej pracy zaproponowano asyn-chroniczny model wyspowy na klastrach maszyn wyposażonych w karty GPU. Jest to najnowszy trend w zakresie superkomputerów oraz obliczeń w chmurze (ang. cloud computing). To ostatnie usprawnienie platformy EASEA (ang. EAsy Specification of Evolutionary Algorithms) łatwa specyfikacja algorytmów ewolucyjnych) pozwala na łatwą eksploatację rozbudowanych systemów (komputerów) masowo równoległych. Pokazano, że można osiągnąć ponadliniowe przyspieszenie w stosunku do jednej maszyny oraz liniowe przyspieszenie stosując klastery o różnych rozmiarach. Takie implementacje wyspowe dla kilku potencjalnie heterogenicznych maszyn otwiera nowe perspektywy dla różnych obszarów zastosowań, w których czasy obliczeń odgrywają kluczową rolę.

12

Using transitive closure and transitive reduction to extract coarse-grained parallelism in program loops

Bielecki W., Pałkowski M., Siedlecki K.

Pomiary Automatyka Kontrola

|

2010

|

R. 56, nr 8

976-979

EN

A technique for extracting coarse-grained parallelism available in loops is presented. It is based on splitting a set of dependence relations into two sets. The first one is to be used for generating code scanning slices while the second one permits us to insert send and receive functions to synchronize the slices execution. The paper presents a way demonstrating how to remove redundant synchronization in generated code by means of the transitive reduction operation. Results of experiments - how many synchronization points can be removed, speed-up and efficiency of examined parallel loops are discussed.

PL

W artykule zaprezentowano technikę ekstrakcji równoległości grubo-ziarnistej w pętlach programowych. Bazuje ona na podziale relacji zależności na dwa zbiory: na podstawie pierwszego generowany jest kod skanujący niezależne fragmenty, natomiast drugi służy do wstawienia funkcji send i receive (wyślij i odbierz) służących do synchronizacji tych fragmentów. Operacje te zrealizowano za pomocą semaforów, możliwe jest jednak wykorzystanie innej konstrukcji, bardziej wydajnej dla danego środowiska. Algorytm generuje kod z zaznaczonymi punktami synchronizacji, nie narzuca jednak ich implementacji. W artykule przeanalizowano technikę wyszukiwania i eliminacji zbędnych punktów synchronizacji. Ekstrakcja równoległości za pomocą fragmentów kodu bazuje na operacji tranzytywnego domknięcia, znanej także z teorii grafów. Operacja ta jest również wykorzystana do obliczenia tranzytywnej redukcji, za pomocą której eliminowana jest nadmiarowa synchronizacja. Usuwanie zbędnej komunikacji pomiędzy wątkami obliczeń jest istotne, ponieważ ich obsługa zwłaszcza dla komputerów z pamięcią dzieloną, w których ich koszt obsługi jest istotny. Docelowe jest zatem uzyskanie gruboziarnistego kodu równoległego. Zbadano także wyniki przeprowa-dzonych eksperymentów pod kątem przyspieszenia i efektywności obliczeń.

13

Wyznaczenie punktów reprezentatywnych niezależnych fragmentów kodu w grafie zależności pętli programowych

Bielecki W., Pałkowski M., Klimek T.

Metody Informatyki Stosowanej

|

2010

|

nr 1 (22)

13-20

PL

W artykule przedstawiono nowy algorytm wyznaczania punktów reprezentatywnych cechujacy się mniejszą złożonością obliczeń w porównaniu do rozwiazania [6-7]. Powodzenie wyznaczania punktów jest zależne tylko od obliczenia dokładnego tranzytywnego domknięcia unii relacji zależności pętli. Oprócz tego należy wykonać szereg podstawowych operacji, jak: część wspólna, iloczyn skalarny, unia, aplikacja relacji na zbiorze, inwersja, projekcja. Relacja RUSC budowana jest wieloetapowo dzięki czemu można dokonywać pośrednich uproszczeń jej postaci. Opisane podejście zostało zaimplementowane i przetestowane pod kątem skuteczności na zbiorze pętli testowych NAS. W dalszych badaniach planowane jest zbadanie proponowanego algorytmu z innymi zbiorami pętli testowych oraz dalsze udoskonalanie algorytmów do wyznaczania fragmentów dla dowolnej topologii zależności pod kątem generowania wydajnego kodu równoległego.

EN

An algorithm of finding representatives of synchronization-free slices available in program loops is presented. It based on the transitive closure of a union of dependence relations describing all the dependences in program loops. An algorithm to calculate transitive closure is studied. Both the algorithms are implemented by means of the Omega library. The results of experiments with the NAS Parallel Benchmark are discussed.

14

Extracting representative loop statement instances of synchronization-free slices

Bielecki W., Palkowski M., Beletska A.

Pomiary Automatyka Kontrola

|

2009

|

R. 55, nr 10

807-810

EN

Extracting synchronization-free parallelism by means of the Iteration Space Slicing Framework consists of two steps. First, representative loop statement instances of slices are extracted. Next, slices are reconstructed from their representatives and parallel code scanning slices and elements of each slice is generated. In this paper, we present how to benefit from this technique in practice. We explain how to extract representative loop statement instances of slices by means of the Omega Library enlarged by four new functions allowing us to simplify the process of extracting slice representatives. Results of experiments with the NAS and UTDSP benchmarks are presented.

PL

Rozwój architektur wielordzeniowych wymusza poszukiwanie algorytmów automatycznego zrównoleglenia aplikacji. W artykule opisano zrównoleglenie pętli programowych za pomocą ekstrakcji niezależnych fragmentów kodu. Ekstrakcja równoległości w pętlach programowych pozbawionych synchronizacji za pomocą podziału przestrzeni iteracji składa się z dwóch kroków. Najpierw znajdowane są instancje instrukcji będące początkami fragmentów kodu. Następnie fragmenty kodu uzupełniane są o wszystkie instrukcje i generowany jest kod równoległy. W artykule przedstawiono korzyści wynikające z takiego podejścia. Wyjaśniono sposób poszukiwania instancji instrukcji fragmentów kodu za pomocą biblioteki Omega rozszerzonej o nowe funkcje upraszczające poszukiwanie instrukcji należących do fragmentów kodu. Opis proponowanego podejścia uzupełniono o zbiór eksperymentów na pętlach testowych NAS i UTDSP.

15

Hermite spline interpolation on patches for parallelly solving the Vlasov-Poisson equation

Crouseilles N., Latu G., Sonnendrücker E.

International Journal of Applied Mathematics and Computer Science

|

2007

|

Vol. 17, no 3

335-349

EN

This work is devoted to the numerical simulation of the Vlasov equation using a phase space grid. In contrast to Particle- In-Cell (PIC) methods, which are known to be noisy, we propose a semi-Lagrangian-type method to discretize the Vlasov equation in the two-dimensional phase space. As this kind of method requires a huge computational effort, one has to carry out the simulations on parallel machines. For this purpose, we present a method using patches decomposing the phase domain, each patch being devoted to a processor. Some Hermite boundary conditions allow for the reconstruction of a good approximation of the global solution. Several numerical results demonstrate the accuracy and the good scalability of the method with up to 64 processors. This work is a part of the CALVI project.