Wyniki wyszukiwania - Biblioteka Nauki

1

Hardware-efficient algorithms for implementation of the GHM discrete multiwavelet transform kernels

100%

Cariow A. , Cariowa G.

Measurement Automation Monitoring

|

2016

|

tom Vol. 62, No. 6

190--192

EN

In this correspondence, we discuss two efficient algorithms for the execution of forward (FDMWT) and inverse (IDMWT) discrete multiwavelet transform basic operations with reduced computational complexities. We used multiwavelet basis proposed by Geronimo, Hadrin, and Massopust (GHM). The direct implementation of GHM-FDMWT basic operation requires 23 multiplications and 19 additions. The direct implementation of GHM-IDMWT basic operation requires 23 multiplication and 16 additions. At the same time, our solutions allow designing the computation procedures, which take only 10 multiplications plus 15 additions for GHM-FDMWT basic operation and 10 multiplications plus 10 additions for GHM-IDMWT basic operation

2

A rationalized algorithm for complex-valued inner product calculation

100%

Cariow A. , Cariowa G.

Pomiary Automatyka Kontrola

|

2012

|

tom R. 58, nr 7

674-676

EN

This paper presents a rationalized algorithm for calculating a complex-valued inner product. The main idea of algorithm synthesis uses the well-known opportunity to calculate the product of two complex numbers with three multiplications and five additions of real numbers. Thus, the proposed algorithmic solution reduces the number of real multiplications and additions compared to the schoolbook implementation, and takes advantage of parallelization of calculation offered by field-programmable gate arrays (FPGAs).

PL

W artykule został przedstawiony równoległy algorytm wyznaczania iloczynu skalarnego dwóch wektorów, których elementami są liczbami zespolonymi. Proponowany algorytm wyróżnia się w stosunku do całkowicie równoległej implementacji metody naiwnej zredukowaną złożonością multiplikatywną. Jeśli metoda naiwna wymaga wykonania 4N mnożeń (układów mnożących podczas implementacji sprzętowej) oraz 2(2N-1) dodawań (sumatorów) liczb rzeczywistych to proponowany algorytm wymaga tylko 3N mnożeń oraz 6N-1 dodawań. W pracy została przedstawiona zracjonalizowana wektorowo-macierzowa procedura obliczeniowa wyznaczania takich iloczynów a także zdefiniowane konstrukcje macierzowe, wchodzące w skład owej procedury. Przy implementacji sprzętowej proponowany algorytm posiada niewątpliwe walory w stosunku do implementacji naiwnego sposobu zrównoleglenia obliczeń wymagającego więcej bloków mnożących. A ponieważ blok mnożący pochłania znacznie więcej zasobów sprzętowych platformy implementacyjnej niż sumator, to redukcja liczby tych bloków przy projektowaniu jednostek obliczeniowych jest sprawą niezwykle aktualną. W przypadku implementacji jednostki do obliczania iloczynu skalarnego w strukturze FPGA proponowane rozwiązanie pozwala zaoszczędzić pewną część umieszczonej w układzie puli bloków mnożących lub też elementów logicznych.

3

Hardware-Efficient Structure of the Accelerating Module for Implementation of Convolutional Neural Network Basic Operation

100%

Cariow A. , Cariowa G.

Measurement Automation Monitoring

|

2018

|

tom Vol. 64, No. 2

40--42

EN

This paper presents a structural design of the hardware-efficient module for implementation of convolution neural network (CNN) basic operation with reduced implementation complexity. For this purpose we utilize some modification of the Winograd’s minimal filtering method as well as computation vectorization principles. This module calculate inner products of two consecutive segments of the original data sequence, formed by a sliding window of length 3, with the elements of a filter impulse response. The fully parallel structure of the module for calculating these two inner products, based on the implementation of a naïve method of calculation, requires 6 binary multipliers and 4 binary adders. The use of the Winograd’s minimal filtering method allows to construct a module structure that requires only 4 binary multipliers and 8 binary adders. Since a high-performance convolutional neural network can contain tens or even hundreds of such modules, such a reduction can have a significant effect.

4

Low complexity algorithm for multiplying octonions

100%

Cariow A. , Cariowa G.

Przegląd Elektrotechniczny

|

2014

|

tom R. 90, nr 2

109--112

EN

We propose an original algorithmic solution for multiplication of octonions. In previously published algorithms for computing the product of octonions the number of multiplications has been reduced by significantly increasing number of additions and shifts. A dignity of the proposed solutions is to reduce by 25% the number of multiplications needed to calculate the product of octonions compared with naive method. At the same time the number of additions is the same as in the naive way of calculations. During synthesis of the discussed algorithm we use a fact that octonion product may be represented as a matrix-vector product. Such representation provides a possibility to discover repeating elements in the matrix structure and to use specific properties of their mutual placement for reducing the number of real multiplications needed to calculate the octonion product.

PL

W artykule przedstawiono szybki algorytm wyznaczania iloczynu oktonionów. Algorytm ten cechuje się zredukowaną o 25% liczbą operacji mnożenia w porównaniu do algorytmu naiwnego przy zachowaniu takiej samej liczby dodawań liczb rzeczywistych.

5

A hardware-oriented algorithm for complex-valued constant matrix-vector multiplication

100%

Cariow A. , Cariowa G.

Przegląd Elektrotechniczny

|

2017

|

tom R. 93, nr 1

87--90

EN

In this communication we present a hardware-oriented algorithm for constant matrix-vector product calculating, when the all elements of vector and matrix are complex numbers. The main idea behind our algorithm is to combine the advantages of Winograd’s inner product formula with Gauss's trick for complex number multiplication. The proposed algorithm versus the naïve method of analogous calculations drastically reduces the number of multipliers required for FPGA implementation of complex-valued constant matrix-vector multiplication. If the fully parallel hardware implementation of naïve (schoolbook) method for complex-valued matrix-vector multiplication requires 4MN multipliers, 2M N-inputs adders and 2MN two-input adders, the proposed algorithm requires only 3N(M+1)/2 multipliers and [3M(N+2)+1,5N+2] two-input adders and 3(M+1) N/2-input adders.

PL

W komunikacie został zaprezentowany sprzętowo-zorientowany algorytm mnożenia macierzy stałych przez wektor zmiennych w założeniu, gdy zarówno elementy macierzy jak i elementy wektora są liczbami zespolonymi. Główna idea proponowanego algorytmu polega na łącznym zastosowaniu wzoru Winograda do wyznaczania iloczynu skalarnego oraz formuły Gaussa mnożenia liczb zespolonych. W porównaniu z tradycyjnym sposobem realizacji obliczeń proponowany algorytm pozwala zredukować liczbę układów mnożących niezbędnych do całkowicie równoległej realizacji na platformie FPGA układu wyznaczania iloczynu wektorowo-macierzowego. Jeśli całkowicie równoległa implementacja tradycyjnej metody wyznaczania omawianych iloczynów wymaga 4MN bloków mnożących, 2M N-wejściowych sumatorów oraz 2MN sumatorów dwuwejściowych, to proponowany algorytm wymaga tylko 3N(M+1)/2 błoków mnożenia, [3M(N+2)+1,5N+2] sumatorów dwuwejściowych i 3(M+1) sumatorów N/2-wejściowych.

6

An algorithm for multiplication of Dirac numbers

100%

Cariow A. , Cariowa G.

Journal of Theoretical and Applied Computer Science

|

2013

|

tom Vol. 7, nr 4

26--34

EN

In this work a rationalized algorithm for Dirac numbers multiplication is presented. This algorithm has a low computational complexity feature and is well suited to parallelization of computations. The computation of two Dirac numbers product using the naïve method takes 256 real multiplications and 240 real additions, while the proposed algorithm can compute the same result in only 128 real multiplications and 160 real additions. During synthesis of the discussed algorithm we use the fact that Dirac numbers product may be represented as vector-matrix product. The matrix participating in the product has unique structural properties that allow performing its advantageous decomposition. Namely this decomposition leads to significant reducing of the computational complexity.

7

Representation of sedenions multiplication via matrix-vector product

100%

Cariowa G. , Cariow A.

Metody Informatyki Stosowanej

|

2011

|

tom nr 1

133-139

EN

The article shows how to represent the multiplication of two sedenionss as a vector-matrix product. Matrfc, algebra offers not only a formalism for describing the algorithm, but it enables the derivation by pure algebraic manipulańons of an algorithm that is well suited to be implemented in vector and matrix digital data processors with various levels of paral-lelism. In addition, the mentioned procedures can be directly used for easy implementation in matrix-oriented languages like Matlab.

8

An algorithm for complex-valued vector-matrix multiplication

100%

Cariow A. , Cariowa G.

Przegląd Elektrotechniczny

|

2012

|

tom R. 88, nr 10b

213-216

EN

In this note we present the algorithm for vector-matrix product calculating for vectors and matrices whose elements are complex numbers.

PL

W artykule został przedstawiony zracjonalizowany algorytm wyznaczania iloczynu wektorowo-macierzowego, dla danych będących liczbami zespolonymi. Proponowany algorytm wyróżnia się w stosunku do metody naiwnej zredukowaną złożonością multiplikatywną. Jeśli metoda naiwna wymaga wykonania 4MN mnożeń oraz 2M(2N-1) dodawań liczb rzeczywistych to proponowany algorytm wymaga tylko 3MN mnożeń oraz N+M(5N-1) dodawań.

9

An unified approach for developing rationalized algorithms for hypercomplex number multiplication

100%

Cariow A. , Cariowa G.

Przegląd Elektrotechniczny

|

2015

|

tom R. 91, nr 2

36-39

EN

In this article we present a common approach for the development of algorithms for calculating products of hypercomplex numbers. The main idea of the proposed approach is based on the representation of hypernumbers multiplying via the matrix-vector products and further creative decomposition of the matrix, leading to the reduction of arithmetical complexity of calculations. The proposed approach allows the construction of sufficiently well algorithms for hypernumbers multiplication with reduced computational complexity. If the schoolbook method requires N2 real multiplications and N(N-1) real additions, the proposed approach allows to develop algorithms, which take only [N(N-1)/2]+2 real multiplications and 3Nlog2N+[N(N-3)+4]/2 real additions.

PL

W artykule zostało przedstawione uogólnione podejście do syntezy algorytmów wyznaczania iloczynów liczb hiperzespolonych. Główna idea proponowanego podejścia polega na reprezentacji operacji mnożenia liczb hiperzespolonych w formie iloczynu wektorowomacierzowego i dalszej możliwości kreatywnej dekompozycji czynnika macierzowego prowadzącej do redukcji złożoności obliczeniowej. Proponowane podejście pozwala zbudować algorytmy wyróżniające się w porównaniu do metody naiwnej zredukowaną złożonością obliczeniową. Jeśli metoda naiwna wymaga wykonania N2 mnożeń oraz N(N-1) dodawań liczb rzeczywistych to proponowane podejście pozwala syntetyzować algorytmy wymagające tylko [N(N-1)/2]+2 mnożeń oraz 3Nlog2N+[N(N-3)+4]/2 dodawań.

10

An algorithm for multiplication of trigintaduonions

100%

Cariow A. , Cariowa G.

Journal of Theoretical and Applied Computer Science

|

2014

|

tom Vol. 8, nr 1

50--75

EN

In this paper we introduce efficient algorithm for the multiplication of trigintaduonions. The direct multiplication of two trigintaduonions requires 1024 real multiplications and 992 real additions. We show how to compute a trigintaduonion product with 498 real multiplications and 943 real additions. During synthesis of the discussed algorithm we use a fact that trigintaduonion multiplication may be represented by a vector-matrix product. Such representation provides a possibility to discover repeating elements in the matrix structure and to use specific properties of their mutual placement to decrease the number of real multiplications needed to compute the product of two trigintaduonions.

11

The new module for rules discovering and visualization for NovoSpark® Visualizer software

100%

Pilipczuk O. , Cariowa G.

Przegląd Elektrotechniczny

|

2015

|

tom R. 91, nr 11

197-200

EN

In this paper we present the new rough sets module for NovoSpark® Visualizer (NV) software. We describe the NV system architecture and the place of the new module in it. We also present the procedure of rough sets analysis with NV software. In addition an example of rules discovering and visualization is provided to evaluate the proposed module. The results show that useful rules are discovered efficiently from the data set.

PL

W artykule zaprezentowano projekt nowego modułu do automatyzacji teorii zbiorów przybliżonych dla oprogramowania NovoSpark® Visualizer (NV). Opisano architekturę systemu oraz wskazano miejsce nowego modułu. Ponadto zaprezentowano przebieg procedury analizy i wizualizacji zbiorów przybliżonych w systemie. Przedstawiono przykład odkrywania i wizualizacji reguł za pomocą opracowanej procedury. W wyniku przeprowadzenia eksperymentu udało się otrzymać szereg użytecznych reguł decyzyjnych.

12

A Hardware-Efficient Structure of Complex Numbers Divider

100%

Cariow A. , Cariowa G.

Measurement Automation Monitoring

|

2017

|

tom Vol. 63, No. 6

212--213

EN

In this correspondence an efficient approach to structure of hardware accelerator for calculating the quotient of two complex-numbers with reduced number of underlying binary multipliers is presented. The fully parallel implementation of a complex-number division using the conventional approach to structure organization requires 4 multipliers, 3 adders, 2 squarers and 2 divider while the proposed structure requires only 3 multipliers, 6 adders, 2 squarers and 2 divider. Because the hardware complexity of a binary multiplier grows quadratically with operand size, and the hardware complexity of an binary adder increases linearly with operand size, then the complex-number divider structure containing as little as possible embedded multipliers is preferable.

13

Some Schemes for Implementation of Arithmetic Operations with Complex Numbers Using Squaring Units

100%

Cariow A. , Cariowa G.

Measurement Automation Monitoring

|

2017

|

tom Vol. 63, No. 6

209--211

EN

In this paper, new schemes for a squarer, multiplier and divider of complex numbers are proposed. Traditional structural solutions for each of these operations require the presence of some number of general-purpose binary multipliers. The advantage of our solutions is a removing of multiplications through replacing them by less costly squarers. We use Logan's trick and quarter square technique, which propose to replace the calculation of the product of two real numbers by summing the squares. Replacing usual multipliers with digital squares implies the reducing power consumption as well as decreases the complexity of the hardware circuit. The squarer requiring less area and power as compared to general-purpose multiplier, it is interesting to assess the use of squarers to implementation of complex arithmetic.

14

A rationalized structure of processing unit to multiply 3x3 matrices

80%

Cariow A. , Sysło W. , Cariowa G. , Gliszczyński M.

Pomiary Automatyka Kontrola

|

2012

|

tom R. 58, nr 7

677-680

EN

This paper presents a high-speed parallel 3x3 matrix multiplier structure. To reduce the hardware complexity of the multiplier structure, we propose to modify the Makarov's algorithm for 3?3 by 3?3 matrix multiplication. The process of matrix product calculation is successively decomposed so that a minimal set of multipliers and fewer adders are used to generate partial results which are combined to generate the final results. Thus, our proposed modification reduces the number of adders compared to the direct implementation of the Makarov's algorithm, and takes advantage of parallelism of calculation offered by field-programmable gate arrays (FPGA's).

PL

W pracy została przedstawiona struktura jednostki procesorowej do wyznaczania iloczynu dwóch macierzy trzeciego stopnia. W odróżnieniu od implementacji naiwnego sposobu zrównoleglenia obliczeń wymagającego 27 układów mnożących proponowana równoległa struktura wymaga tylko 22 układa mnożących. A ponieważ układ mnożący pochłania znacznie więcej zasobów sprzętowych platformy implementacyjnej niż sumator, to minimalizacja układów mnożących przy projektowaniu mikroelektronicznych jednostek procesorowych jest sprawą nadrzędną. Zasada budowy proponowanej jednostki oparta jest na realizacji autorskiej modyfikacji metody Makarova, z tym, że implementacja naszej modyfikacji wymaga o 38 sumatorów mniej niż implementacja metody Makarova. Zaproponowana struktura może bycz z powodzeniem zastosowana do akceleracji obliczeń w podsystemach cyfrowego przetwarzania danych zrealizowanych na platformach FPGA oraz zaimplementowana w dowolnym środowisku sprzętowym, na przykład zrealizowana w postaci układu ASIC. W tym ostatnim przypadku niewątpliwym atutem wyróżniającym przedstawione rozwiązanie jest to, że zaprojektowany w ten sposób układ będzie zużywać mniej energii oraz wydzielać mniej ciepła.

15

An FPGA-oriented fully parallel algorithm for multiplying dual quaternions

80%

Cariow A. , Cariowa G. , Witczak M.

Measurement Automation Monitoring

|

2015

|

tom Vol. 61, No. 7

370--372

EN

This paper presents a low multiplicative complexity fully parallel algorithm for multiplying two dual quaternions. The “pen-and-paper” multiplication of two dual quaternions requires 64 real multiplications and 56 real additions. More effective solutions still do not exist. We show how to compute a product of two dual quaternions with 24 real multiplications and 64 real additions. During synthesis of the discussed algorithm we use the fact that the product of two dual quaternions can be represented as a matrix–vector product. The matrix multiplicand that participates in the product calculating has unique structural properties that allow performing its advantageous factorization. Namely this factorization leads to significant reducing of the multiplicative complexity of dual quaternion multiplication. We show that by using this approach, the computational process of calculating dual quaternion product can be structured so that eventually requires only half the number of multipliers compared to the direct implementation of matrix-vector multiplication.

16

Hardware-Efficient Schemes of Quaternion Multiplying Units for 2D Discrete Quaternion Fourier Transform Processors

80%

Cariow A. , Cariowa G. , Chicheva M.

Measurement Automation Monitoring

|

2017

|

tom Vol. 63, No. 6

206--208

EN

In this paper, we offer and discuss three efficient structural solutions for the hardware-oriented implementation of discrete quaternion Fourier transform basic operations with reduced implementation complexities. The first solution – a scheme for calculating sq product, the second solution – a scheme for calculating qt product, and the third solution – a scheme for calculating sqt product, where s is a so-called i -quaternion, t is an j - quaternion, and q – is an usual quaternion. The direct multiplication of two usual quaternions requires 16 real multiplications (or two-operand multipliers in the case of fully parallel hardware implementation) and 12 real additions (or binary adders). At the same time, our solutions allow to design the computation units, which consume only 6 multipliers plus 6 two input adders for implementation of sq or qt basic operations and 9 binary multipliers plus 6 two-input adders and 4 four-input adders for implementation of sqt basic operation.