Wyniki wyszukiwania - BazTech

1

A rationalized structure of processing unit to multiply 3x3 matrices

Cariow A., Sysło W., Cariowa G., Gliszczyński M.

Pomiary Automatyka Kontrola

|

2012

|

R. 58, nr 7

677-680

EN

This paper presents a high-speed parallel 3x3 matrix multiplier structure. To reduce the hardware complexity of the multiplier structure, we propose to modify the Makarov's algorithm for 3?3 by 3?3 matrix multiplication. The process of matrix product calculation is successively decomposed so that a minimal set of multipliers and fewer adders are used to generate partial results which are combined to generate the final results. Thus, our proposed modification reduces the number of adders compared to the direct implementation of the Makarov's algorithm, and takes advantage of parallelism of calculation offered by field-programmable gate arrays (FPGA's).

PL

W pracy została przedstawiona struktura jednostki procesorowej do wyznaczania iloczynu dwóch macierzy trzeciego stopnia. W odróżnieniu od implementacji naiwnego sposobu zrównoleglenia obliczeń wymagającego 27 układów mnożących proponowana równoległa struktura wymaga tylko 22 układa mnożących. A ponieważ układ mnożący pochłania znacznie więcej zasobów sprzętowych platformy implementacyjnej niż sumator, to minimalizacja układów mnożących przy projektowaniu mikroelektronicznych jednostek procesorowych jest sprawą nadrzędną. Zasada budowy proponowanej jednostki oparta jest na realizacji autorskiej modyfikacji metody Makarova, z tym, że implementacja naszej modyfikacji wymaga o 38 sumatorów mniej niż implementacja metody Makarova. Zaproponowana struktura może bycz z powodzeniem zastosowana do akceleracji obliczeń w podsystemach cyfrowego przetwarzania danych zrealizowanych na platformach FPGA oraz zaimplementowana w dowolnym środowisku sprzętowym, na przykład zrealizowana w postaci układu ASIC. W tym ostatnim przypadku niewątpliwym atutem wyróżniającym przedstawione rozwiązanie jest to, że zaprojektowany w ten sposób układ będzie zużywać mniej energii oraz wydzielać mniej ciepła.

2

Zwektoryzowane algorytmy obliczania Transformaty S dla środowiska wieloprocesorowego

Tariov A., Gliszczyński M.

Pomiary Automatyka Kontrola

|

2011

|

R. 57, nr 11

1401-1403

PL

W pracy został zaprezentowany wektoryzowany algorytm obliczania transformaty S w dwóch wariantach - w postaci sekwencyjno-równoległej pozwalającej na oszczędzenie zasobów sprzętowych oraz w postaci równoległej pozwalającej wykorzystać, nowoczesne wielordzeniowe platformy obliczeniowe. W drugim przypadku możliwa jest znaczna redukcja czasu trwania algorytmu. Obie metody mogą znaleźć zastosowanie praktyczne zależnie od oczekiwanej dokładności (rozdzielczości) i szybkości działania jak też możliwości platformy obliczeniowej.

EN

In the paper the algorithm for calculating N by N-point S Transform is presented. In a sequential, recursive option hardware resources saving is available, while on the other hand, a parallel version of the algorithm allows increasing the accuracy and reducing the time when using multi-core platforms. Two of these approaches can be implemented in practical use depending on the expected accuracy, speed and power of the hardware platform. At the beginning of the paper uses of S Transform with other similar solutions are described. Advantages and disadvantages of S Transform, which are good properties of the time-frequency analysis of non-stationary signals thanks to a movable, different sized Gaussian window, but at the same time a long computation time of the standard, sequential method, are considered. Next, the theoretical, continuous form of the transform and the discrete form with the sequential algorithm are presented. Later The main part of the work deals with synthesis of the sequential and parallel version of the algorithm in the matrix-vector form. The data flow in the algorithms in space and time is shown in Figs. 1 and 2 (for sequential and parallel approach). Finally, the computation times of two versions are compared. The advantage of the two presented approaches is simple and understandable tensor product representation which makes the implementation easy. The sequential algorithm can be used for slower platforms, where the real time analysis is not necessary, while the parallel version offers quick computation on multi-core processors.

3

Aspekty algorytmiczne organizacji układu do mnożenia długich operandów

Gliszczyński M., Tariov A.

Pomiary Automatyka Kontrola

|

2010

|

R. 56, nr 10

1130-1132

PL

W pracy została zaprezentowana zracjonalizowana struktura algorytmiczna do obliczania iloczynu dwóch operandów N-elementowych ze zredukowaną liczbą układów mnożących i sumatorów dla dowolnej dużej wartości. Pozwala to przy implementacji zmniejszyć nakłady obliczeniowe lub zapotrzebowanie na zasoby sprzętowe oraz stworzyć dogodne warunki do efektywnej realizacji operacji mnożenia dużych liczb w dowolnym sprzętowo-programowym środowisku implementacyjnym.

EN

In the paper the fast algorithm for two long, N-digits numbers multiplication with reduced number of multipliers and adders for any large value is presented. In the first paragraph a theoretical problem with known solutions is presented. While naive N-bits numbers multiplication requires N×N fragmentary, one-bit multiplications, the Karatsuba approach using simple transformations (1-3) and recursion provides smaller complexity. There are also described other, faster algorithms of more complicated structure (Toom-3, Schönhage-Strassen). The algorithm presented in this paper is based on the Karatsuba method because of its simplicity. In the second paragraph there is given the synthesis of a matrix-based, tensor product algorithm for large operand multiplications with no recursion needed. All matrix constructions are predefined. Next, the graph-structural models of the algorithm with time/space flow of data for N=2 and N=4 are shown (Figs. 1, 2). At the end, estimation of the number of multipliers and adders according to the operands length, supported by Table 1, is discussed (11, 12). The advantages of the new algorithm are simple and clear matrices-based construction, easy implementation, no need of recursion, possibility of parallel execution and reduced number of multipliers. Next problem to consider is profitability of the proposed approach in multi-core hardware implementation depending on N value.

4

A computation algorithm for Strassen's matrix multiplication

Tariov A., Gliszczyński M.

Pomiary Automatyka Kontrola

|

2010

|

R. 56, nr 7

691-693

EN

In the work the vectorized algorithm for Strassen's matrix product calculating is presented. Unlike the proposed in other works "some recommendations" relating to the Strassen's matrix multiplication implementation, we offer specific computational procedures that allow correctly describe the entire sequence of transformations needed to obtain the final result. The proposed algorithm can be successfully applied to accelerate calculations in the FPGA-based platforms.

PL

W pracy został przedstawiony wektoryzowany algorytm wyznaczenia iloczynu macierzowego Strassena. W odróżnieniu od poruszanych w innych publikacjach wybranych uwag dotyczących realizacji metody Strassena w niniejszej pracy zaproponowane są konkretne procedury, opisujące cały proces obliczeniowy i pozwalające na podstawie wykonania skończonej liczby etapów przetwarzania danych wejściowych otrzymać wynik końcowy. Została roztrząśnięta synteza proponowanego algorytmu oraz pokazana postać stosownego grafu przepływowego dla przykładu mnożenia macierzy drugiego rzędu. Zaproponowany algorytm może być sukcesywnie zastosowany do przyspieszonej realizacji obliczeń w platformach FPGA oraz zaimplementowany w wybranym środowisku sprzętowym. Niewątpliwym atutem odróżniającym przedstawione rozwiązanie od tradycyjnego algorytmu jest również brak rekurencji obliczeń, co daje dodatkowy zysk przy zrównolegleniu procesu wyznaczenia iloczynu.

5

Szybki algorytm splotu kołowego dla N = 2^m

Gliszczyński M., Tariov A.

Pomiary Automatyka Kontrola

|

2009

|

R. 55, nr 8

566-568

PL

W pracy został przedstawiony szybki algorytm liczenia splotu kołowego N-elementowych wektorów danych ze zredukowaną liczbą operacji arytmetycznych (lub układów mnożących i sumatorów, jeśli chodzi o implementację sprzętową) w przypadku, gdy N=2^m, m - liczba całkowita. Pozwala to przy implementacji zmniejszyć nakłady obliczeniowe lub zapotrzebowanie na zasoby sprzętowe oraz stworzyć dogodne warunki do efektywnej realizacji operacji splotu kołowego w dowolnym sprzętowo-programowym środowisku implementacyjnym.

EN

In the work the fast algorithm for 2n-point circular convolution calculating with the reduced number of arithmetic operations (or multipliers and adders - in hardware implementation case) is presented. Computational procedure for describing the algorithm, based on the successful decomposition of the circulant matrix of arbitrary order is shown. This approach allows to lower hardware expenses and to create favorable conditions for effective convolution realization in the reprogrammable platform. Computational procedure for circular convolution realization can be described by means of matrix algebra notation. Matrix algebra offers not only a formalism for describing the algorithm, but it enables the derivation by pure algebraic manipulations of an algorithm that is well suited to be implemented in vector and matrix digital signal processors with various levels of parallelism. In addition, the mentioned procedures can be directly used for easy implementation in matrix-oriented languages like Matlab.