Wyniki wyszukiwania - Biblioteka Nauki

1

Hardware-Efficient Structure of the Accelerating Module for Implementation of Convolutional Neural Network Basic Operation

100%

Cariow A. , Cariowa G.

Measurement Automation Monitoring

|

2018

|

tom Vol. 64, No. 2

40--42

EN

This paper presents a structural design of the hardware-efficient module for implementation of convolution neural network (CNN) basic operation with reduced implementation complexity. For this purpose we utilize some modification of the Winograd’s minimal filtering method as well as computation vectorization principles. This module calculate inner products of two consecutive segments of the original data sequence, formed by a sliding window of length 3, with the elements of a filter impulse response. The fully parallel structure of the module for calculating these two inner products, based on the implementation of a naïve method of calculation, requires 6 binary multipliers and 4 binary adders. The use of the Winograd’s minimal filtering method allows to construct a module structure that requires only 4 binary multipliers and 8 binary adders. Since a high-performance convolutional neural network can contain tens or even hundreds of such modules, such a reduction can have a significant effect.

2

Low complexity algorithm for multiplying octonions

100%

Cariow A. , Cariowa G.

Przegląd Elektrotechniczny

|

2014

|

tom R. 90, nr 2

109--112

EN

We propose an original algorithmic solution for multiplication of octonions. In previously published algorithms for computing the product of octonions the number of multiplications has been reduced by significantly increasing number of additions and shifts. A dignity of the proposed solutions is to reduce by 25% the number of multiplications needed to calculate the product of octonions compared with naive method. At the same time the number of additions is the same as in the naive way of calculations. During synthesis of the discussed algorithm we use a fact that octonion product may be represented as a matrix-vector product. Such representation provides a possibility to discover repeating elements in the matrix structure and to use specific properties of their mutual placement for reducing the number of real multiplications needed to calculate the octonion product.

PL

W artykule przedstawiono szybki algorytm wyznaczania iloczynu oktonionów. Algorytm ten cechuje się zredukowaną o 25% liczbą operacji mnożenia w porównaniu do algorytmu naiwnego przy zachowaniu takiej samej liczby dodawań liczb rzeczywistych.

3

A rationalized algorithm for complex-valued inner product calculation

100%

Cariow A. , Cariowa G.

Pomiary Automatyka Kontrola

|

2012

|

tom R. 58, nr 7

674-676

EN

This paper presents a rationalized algorithm for calculating a complex-valued inner product. The main idea of algorithm synthesis uses the well-known opportunity to calculate the product of two complex numbers with three multiplications and five additions of real numbers. Thus, the proposed algorithmic solution reduces the number of real multiplications and additions compared to the schoolbook implementation, and takes advantage of parallelization of calculation offered by field-programmable gate arrays (FPGAs).

PL

W artykule został przedstawiony równoległy algorytm wyznaczania iloczynu skalarnego dwóch wektorów, których elementami są liczbami zespolonymi. Proponowany algorytm wyróżnia się w stosunku do całkowicie równoległej implementacji metody naiwnej zredukowaną złożonością multiplikatywną. Jeśli metoda naiwna wymaga wykonania 4N mnożeń (układów mnożących podczas implementacji sprzętowej) oraz 2(2N-1) dodawań (sumatorów) liczb rzeczywistych to proponowany algorytm wymaga tylko 3N mnożeń oraz 6N-1 dodawań. W pracy została przedstawiona zracjonalizowana wektorowo-macierzowa procedura obliczeniowa wyznaczania takich iloczynów a także zdefiniowane konstrukcje macierzowe, wchodzące w skład owej procedury. Przy implementacji sprzętowej proponowany algorytm posiada niewątpliwe walory w stosunku do implementacji naiwnego sposobu zrównoleglenia obliczeń wymagającego więcej bloków mnożących. A ponieważ blok mnożący pochłania znacznie więcej zasobów sprzętowych platformy implementacyjnej niż sumator, to redukcja liczby tych bloków przy projektowaniu jednostek obliczeniowych jest sprawą niezwykle aktualną. W przypadku implementacji jednostki do obliczania iloczynu skalarnego w strukturze FPGA proponowane rozwiązanie pozwala zaoszczędzić pewną część umieszczonej w układzie puli bloków mnożących lub też elementów logicznych.

4

Koncepcyjne aspekty projektowania inteligentnego umundurowania dla osób prowadzących akcje ratownicze i interwencyjne

100%

Kraszewski J. , Cariow A.

Pomiary Automatyka Kontrola

|

2012

|

tom R. 58, nr 7

687-689

PL

W artykule zostały opisane koncepcyjne aspekty projektowania umundurowania inteligentnego, przeznaczonego do wspomagania akcji ratunkowych. Umundurowanie składa się z hełmu, odzieży tekstronicznej oraz bransoletki. Hełm wyposażony jest w dwie kamery monitorujące przestrzeń z przodu i z tyłu. Na uniformie umieszczone są czujniki parametrów fizjologicznych i środowiska zewnętrznego oraz moduł elektroniki i zasilania. Bransoletka służy do śledzenia położenia i aktualnego stanu (ruch, bezruch, upadek).

EN

This paper describes conceptual aspects of designing smart uniforms for rescue and emergency assisting. It is assumed that a uniform will consist of a helmet, textronic clothing and a bracelet. These componentsof the uniform are shown in Fig. 1. The helmet is equipped with two video cameras monitoring the space before and after the operation area. Clothing includes a variety of optional features for each type of intervention teams, i.e. various sensors, physiological and environmental parameters and the electronic module that provides collection and two-way exchange of information with the command center of the action through appropriate transmission channels. The bracelet is used to track the location and current state (movement, stillness, and fall) of a person intervening in the monitored area of action. These basic and optional components of the uniform are presented in Tab. 1. Additionally, the system complements the external equipment to assist the collection of distorted signals in a lossy environment. The biggest obstacle in the development of smart clothing is still a problem of multiple washing fabrics with elements of electronics. The advantage of the currently used products is a comprehensive approach to uniform, as a comprehensive system to ensure protection. A smart uniform is not just another gadget, but equipment to protect life and health of its users, but may be a subject to fashion trends [14], utility and styling. Such clothing should not cause discomfort to the user, but should support and facilitate the operation in hazardous conditions.

5

Hardware-efficient algorithms for implementation of the GHM discrete multiwavelet transform kernels

100%

Cariow A. , Cariowa G.

Measurement Automation Monitoring

|

2016

|

tom Vol. 62, No. 6

190--192

EN

In this correspondence, we discuss two efficient algorithms for the execution of forward (FDMWT) and inverse (IDMWT) discrete multiwavelet transform basic operations with reduced computational complexities. We used multiwavelet basis proposed by Geronimo, Hadrin, and Massopust (GHM). The direct implementation of GHM-FDMWT basic operation requires 23 multiplications and 19 additions. The direct implementation of GHM-IDMWT basic operation requires 23 multiplication and 16 additions. At the same time, our solutions allow designing the computation procedures, which take only 10 multiplications plus 15 additions for GHM-FDMWT basic operation and 10 multiplications plus 10 additions for GHM-IDMWT basic operation

6

An algorithm for complex-valued vector-matrix multiplication

100%

Cariow A. , Cariowa G.

Przegląd Elektrotechniczny

|

2012

|

tom R. 88, nr 10b

213-216

EN

In this note we present the algorithm for vector-matrix product calculating for vectors and matrices whose elements are complex numbers.

PL

W artykule został przedstawiony zracjonalizowany algorytm wyznaczania iloczynu wektorowo-macierzowego, dla danych będących liczbami zespolonymi. Proponowany algorytm wyróżnia się w stosunku do metody naiwnej zredukowaną złożonością multiplikatywną. Jeśli metoda naiwna wymaga wykonania 4MN mnożeń oraz 2M(2N-1) dodawań liczb rzeczywistych to proponowany algorytm wymaga tylko 3MN mnożeń oraz N+M(5N-1) dodawań.

7

A hardware-oriented algorithm for complex-valued constant matrix-vector multiplication

100%

Cariow A. , Cariowa G.

Przegląd Elektrotechniczny

|

2017

|

tom R. 93, nr 1

87--90

EN

In this communication we present a hardware-oriented algorithm for constant matrix-vector product calculating, when the all elements of vector and matrix are complex numbers. The main idea behind our algorithm is to combine the advantages of Winograd’s inner product formula with Gauss's trick for complex number multiplication. The proposed algorithm versus the naïve method of analogous calculations drastically reduces the number of multipliers required for FPGA implementation of complex-valued constant matrix-vector multiplication. If the fully parallel hardware implementation of naïve (schoolbook) method for complex-valued matrix-vector multiplication requires 4MN multipliers, 2M N-inputs adders and 2MN two-input adders, the proposed algorithm requires only 3N(M+1)/2 multipliers and [3M(N+2)+1,5N+2] two-input adders and 3(M+1) N/2-input adders.

PL

W komunikacie został zaprezentowany sprzętowo-zorientowany algorytm mnożenia macierzy stałych przez wektor zmiennych w założeniu, gdy zarówno elementy macierzy jak i elementy wektora są liczbami zespolonymi. Główna idea proponowanego algorytmu polega na łącznym zastosowaniu wzoru Winograda do wyznaczania iloczynu skalarnego oraz formuły Gaussa mnożenia liczb zespolonych. W porównaniu z tradycyjnym sposobem realizacji obliczeń proponowany algorytm pozwala zredukować liczbę układów mnożących niezbędnych do całkowicie równoległej realizacji na platformie FPGA układu wyznaczania iloczynu wektorowo-macierzowego. Jeśli całkowicie równoległa implementacja tradycyjnej metody wyznaczania omawianych iloczynów wymaga 4MN bloków mnożących, 2M N-wejściowych sumatorów oraz 2MN sumatorów dwuwejściowych, to proponowany algorytm wymaga tylko 3N(M+1)/2 błoków mnożenia, [3M(N+2)+1,5N+2] sumatorów dwuwejściowych i 3(M+1) sumatorów N/2-wejściowych.

8

Basic Aspects of Designing a High-performance Processor Structure for Calculating a "true" Discrete Fractional Fourier Transform

100%

Cariow A. , Majorkowska-Mech D.

Measurement Automation Monitoring

|

2018

|

tom Vol. 64, No. 2

43--45

EN

This paper presents a basic aspects of structural design of the highperformance processor for implementation of discrete fractional Fourier transform (DFrFT). The general idea of the possibility of parallelizing the calculation of the so-called “true” discrete Fourier transform on the basis of our previously developed algorithmic approach is presented. We specifically focused only on the general aspects of the organization of the structure of such a processor, since the details of a particular implementation always depend on the implementation platform used, while the general idea of constructing the structure of the processor remains unchanged.

9

An algorithm for multiplication of Dirac numbers

100%

Cariow A. , Cariowa G.

Journal of Theoretical and Applied Computer Science

|

2013

|

tom Vol. 7, nr 4

26--34

EN

In this work a rationalized algorithm for Dirac numbers multiplication is presented. This algorithm has a low computational complexity feature and is well suited to parallelization of computations. The computation of two Dirac numbers product using the naïve method takes 256 real multiplications and 240 real additions, while the proposed algorithm can compute the same result in only 128 real multiplications and 160 real additions. During synthesis of the discussed algorithm we use the fact that Dirac numbers product may be represented as vector-matrix product. The matrix participating in the product has unique structural properties that allow performing its advantageous decomposition. Namely this decomposition leads to significant reducing of the computational complexity.

10

A parallel hardware-oriented algorithm for constant matrix-vector multiplication with reduced multiplicative complexity

100%

Cariow A. , Cariow G.

Pomiary Automatyka Kontrola

|

2014

|

tom R. 60, nr 7

510--512

EN

This paper presents the algorithmic aspects of organization of a lowcomplexity fully parallel processor unit for constant matrix-vector products computing. To reduce the hardware complexity (number of twooperand multipliers), we exploit the Winograd’s inner product calculation approach. We show that by using this approach, the computational process of calculating the constant matrix-vector product can be structured so that it eventually requires fewer multipliers than the direct implementation of matrix-vector multiplication.

PL

W pracy został przedstawiony sprzętowo-zorientowany algorytm wyznaczania iloczynu wektora przez macierz stałych. W odróżnieniu od implementacji naiwnego sposobu zrównoleglenia obliczeń wymagającego N2 układów mnożących proponowana równoległa struktura wymaga tylko N(M+1)/2 takich układów. A ponieważ układ mnożący pochłania znacznie więcej zasobów sprzętowych platformy implementacyjnej niż sumator, to minimalizacja liczby tych układów podczas projektowania dedykowanych układów obliczeniowych jest sprawą nadrzędną. Idea syntezy algorytmu oparta jest na wykorzystaniu do wyznaczania cząstkowych iloczynów skalarnych metody S. Winograda. Zaprezentowany w artykule algorytm może być z powodzeniem zastosowany do akceleracji obliczeń w podsystemach cyfrowego przetwarzania danych zrealizowanych na platformach FPGA oraz zaimplementowany w dowolnym środowisku sprzętowym, na przykład zrealizowana w postaci układu ASIC. W tym ostatnim przypadku niewątpliwym atutem wyróżniającym przedstawione rozwiązanie jest to, że zaprojektowany w ten sposób układ będzie zużywać mniej energii oraz wydzielać mniej ciepła.

11

A fast algorithm for multiresolution discrete Fourier transform

100%

Andreatto B. , Cariow A.

Przegląd Elektrotechniczny

|

2012

|

tom R. 88, nr 11a

66-69

EN

The paper presents a fast algorithm for the calculation of a multiresolution discrete Fourier transform. The presented approach is based on the realization of the Fast Fourier Transform for each frequency resolution level. This algorithm allows reducing the number of complex multiplications and additions compared to the method consisting in the multiplication between the input signal expressed as a column vector and the matrix of discrete exponential functions.

PL

W artykule przedstawiono szybki algorytm wyznaczania wielorozdzielczej dyskretnej transformaty Fouriera. Zaprezentowane podejście opiera się na realizacji algorytmu szybkiej transformacji Fouriera na każdym z analizowanych poziomów rozdzielczości częstotliwościowej.

12

An algorithm for discrete fractional Hadamad transform with reduced arithmetical complexity

100%

Majorkowska-Mech D. , Cariow A.

Przegląd Elektrotechniczny

|

2012

|

tom R. 88, nr 11a

70-76

EN

This paper presents an algorithm for discrete fractional Hadamard transform computing for the input vector of length 2n. This algorithm allows for significant reduction in the number of arithmetic operations by taking advantage of the specific structure of discrete fractional Hadamard\ transformation matrix.

PL

W artykule przedstawiony został algorytm wyznaczania dyskretnej frakcjonalnej transformaty Hadamarda dla wektora danych wejściowych o rozmiarze 2n. Algorytm ten pozwala na znaczną redukcję liczby operacji arytmetycznych dzięki wykorzystaniu specyficznej struktury macierzy dyskretnej frakcjonalnej transformacji Hadamarda.

13

Generalized multiresolution discrete orthogonal transforms

100%

Andreatto B. , Cariow A.

Pomiary Automatyka Kontrola

|

2013

|

tom R. 59, nr 8

830--832

EN

This paper presents an idea of the multiresolution discrete orthogonal transforms. One possible approach to realization of this multiresolution transform is implementation of the rationalized algorithm for computing the coefficients creating the consecutive resolution levels. The paper also presents an example of synthesis of the fast algorithm for computing the coefficients of the multiresolution discrete Hartley transform. For the description of the compuatational procedures we use a vector-matrix notation.

PL

W artykule przedstawiono uogólnioną wielorozdzielczą dyskretną transformację ortogonalną. Zdefiniowana w niniejszej pracy transformacja pozwala na analizę sygnału na wielu poziomach rozdzielczości. Poziomy te są stanowione poprzez współczynniki częstotliwościowe uzyskiwane w procesie realizacji szybkich dyskretnych transformat ortogonalnych np. dyskretnej transformaty Fouriera (DFT), dyskretnej transformaty kosinusowej (DCT), dyskretnej transformaty Hartley’a, czy też dyskretnej transformaty slant, w odniesieniu do kolejnych fragmentów badanego sygnału. Przedstawiony w niniejszym artykule schemat postępowania jest słuszny dla sygnałów o liczbie próbek będącej naturalną potęgą liczby dwa. Zastosowanie szybkich algorytmów realizacji poszczególnych przekształceń na kolejnych poziomach rozdzielczości, pozwala na uzyskanie znaczącej redukcji liczby wykonywanych działań arytmetycznych, w porównaniu do metody polegającej na bezpośrednim mnożeniu macierzy bazy i wektora kolumnowego danych wejściowych. W przedłożonej pracy, do opisu poszczególnych procedur obliczeniowych posłużono się rachunkiem wektorowo-macierzowym, który jest adekwatny do opisu przestrzenno-czasowych struktur procesów obliczeniowych, jak również umożliwia w sposób bezpośredni odwzorowanie tychże struktur w przestrzeni realizacji programowych i sprzętowych. W artykule zaprezentowano również przykład syntezy szybkiego algorytmu realizacji wielorozdzielczej dyskretnej transformaty Hartley’a dla sygnału jednowymiarowego o liczbie próbek wynoszącej osiem.

14

An unified approach for developing rationalized algorithms for hypercomplex number multiplication

100%

Cariow A. , Cariowa G.

Przegląd Elektrotechniczny

|

2015

|

tom R. 91, nr 2

36-39

EN

In this article we present a common approach for the development of algorithms for calculating products of hypercomplex numbers. The main idea of the proposed approach is based on the representation of hypernumbers multiplying via the matrix-vector products and further creative decomposition of the matrix, leading to the reduction of arithmetical complexity of calculations. The proposed approach allows the construction of sufficiently well algorithms for hypernumbers multiplication with reduced computational complexity. If the schoolbook method requires N2 real multiplications and N(N-1) real additions, the proposed approach allows to develop algorithms, which take only [N(N-1)/2]+2 real multiplications and 3Nlog2N+[N(N-3)+4]/2 real additions.

PL

W artykule zostało przedstawione uogólnione podejście do syntezy algorytmów wyznaczania iloczynów liczb hiperzespolonych. Główna idea proponowanego podejścia polega na reprezentacji operacji mnożenia liczb hiperzespolonych w formie iloczynu wektorowomacierzowego i dalszej możliwości kreatywnej dekompozycji czynnika macierzowego prowadzącej do redukcji złożoności obliczeniowej. Proponowane podejście pozwala zbudować algorytmy wyróżniające się w porównaniu do metody naiwnej zredukowaną złożonością obliczeniową. Jeśli metoda naiwna wymaga wykonania N2 mnożeń oraz N(N-1) dodawań liczb rzeczywistych to proponowane podejście pozwala syntetyzować algorytmy wymagające tylko [N(N-1)/2]+2 mnożeń oraz 3Nlog2N+[N(N-3)+4]/2 dodawań.

15

Representation of sedenions multiplication via matrix-vector product

100%

Cariowa G. , Cariow A.

Metody Informatyki Stosowanej

|

2011

|

tom nr 1

133-139

EN

The article shows how to represent the multiplication of two sedenionss as a vector-matrix product. Matrfc, algebra offers not only a formalism for describing the algorithm, but it enables the derivation by pure algebraic manipulańons of an algorithm that is well suited to be implemented in vector and matrix digital data processors with various levels of paral-lelism. In addition, the mentioned procedures can be directly used for easy implementation in matrix-oriented languages like Matlab.

16

An algorithm for multiplication of trigintaduonions

100%

Cariow A. , Cariowa G.

Journal of Theoretical and Applied Computer Science

|

2014

|

tom Vol. 8, nr 1

50--75

EN

In this paper we introduce efficient algorithm for the multiplication of trigintaduonions. The direct multiplication of two trigintaduonions requires 1024 real multiplications and 992 real additions. We show how to compute a trigintaduonion product with 498 real multiplications and 943 real additions. During synthesis of the discussed algorithm we use a fact that trigintaduonion multiplication may be represented by a vector-matrix product. Such representation provides a possibility to discover repeating elements in the matrix structure and to use specific properties of their mutual placement to decrease the number of real multiplications needed to compute the product of two trigintaduonions.

17

Some Schemes for Implementation of Arithmetic Operations with Complex Numbers Using Squaring Units

100%

Cariow A. , Cariowa G.

Measurement Automation Monitoring

|

2017

|

tom Vol. 63, No. 6

209--211

EN

In this paper, new schemes for a squarer, multiplier and divider of complex numbers are proposed. Traditional structural solutions for each of these operations require the presence of some number of general-purpose binary multipliers. The advantage of our solutions is a removing of multiplications through replacing them by less costly squarers. We use Logan's trick and quarter square technique, which propose to replace the calculation of the product of two real numbers by summing the squares. Replacing usual multipliers with digital squares implies the reducing power consumption as well as decreases the complexity of the hardware circuit. The squarer requiring less area and power as compared to general-purpose multiplier, it is interesting to assess the use of squarers to implementation of complex arithmetic.

18

A Hardware-Efficient Structure of Complex Numbers Divider

100%

Cariow A. , Cariowa G.

Measurement Automation Monitoring

|

2017

|

tom Vol. 63, No. 6

212--213

EN

In this correspondence an efficient approach to structure of hardware accelerator for calculating the quotient of two complex-numbers with reduced number of underlying binary multipliers is presented. The fully parallel implementation of a complex-number division using the conventional approach to structure organization requires 4 multipliers, 3 adders, 2 squarers and 2 divider while the proposed structure requires only 3 multipliers, 6 adders, 2 squarers and 2 divider. Because the hardware complexity of a binary multiplier grows quadratically with operand size, and the hardware complexity of an binary adder increases linearly with operand size, then the complex-number divider structure containing as little as possible embedded multipliers is preferable.

19

A rationalized structure of processing unit to multiply 3x3 matrices

80%

Cariow A. , Sysło W. , Cariowa G. , Gliszczyński M.

Pomiary Automatyka Kontrola

|

2012

|

tom R. 58, nr 7

677-680

EN

This paper presents a high-speed parallel 3x3 matrix multiplier structure. To reduce the hardware complexity of the multiplier structure, we propose to modify the Makarov's algorithm for 3?3 by 3?3 matrix multiplication. The process of matrix product calculation is successively decomposed so that a minimal set of multipliers and fewer adders are used to generate partial results which are combined to generate the final results. Thus, our proposed modification reduces the number of adders compared to the direct implementation of the Makarov's algorithm, and takes advantage of parallelism of calculation offered by field-programmable gate arrays (FPGA's).

PL

W pracy została przedstawiona struktura jednostki procesorowej do wyznaczania iloczynu dwóch macierzy trzeciego stopnia. W odróżnieniu od implementacji naiwnego sposobu zrównoleglenia obliczeń wymagającego 27 układów mnożących proponowana równoległa struktura wymaga tylko 22 układa mnożących. A ponieważ układ mnożący pochłania znacznie więcej zasobów sprzętowych platformy implementacyjnej niż sumator, to minimalizacja układów mnożących przy projektowaniu mikroelektronicznych jednostek procesorowych jest sprawą nadrzędną. Zasada budowy proponowanej jednostki oparta jest na realizacji autorskiej modyfikacji metody Makarova, z tym, że implementacja naszej modyfikacji wymaga o 38 sumatorów mniej niż implementacja metody Makarova. Zaproponowana struktura może bycz z powodzeniem zastosowana do akceleracji obliczeń w podsystemach cyfrowego przetwarzania danych zrealizowanych na platformach FPGA oraz zaimplementowana w dowolnym środowisku sprzętowym, na przykład zrealizowana w postaci układu ASIC. W tym ostatnim przypadku niewątpliwym atutem wyróżniającym przedstawione rozwiązanie jest to, że zaprojektowany w ten sposób układ będzie zużywać mniej energii oraz wydzielać mniej ciepła.

20

Hardware-Efficient Schemes of Quaternion Multiplying Units for 2D Discrete Quaternion Fourier Transform Processors

80%

Cariow A. , Cariowa G. , Chicheva M.

Measurement Automation Monitoring

|

2017

|

tom Vol. 63, No. 6

206--208

EN

In this paper, we offer and discuss three efficient structural solutions for the hardware-oriented implementation of discrete quaternion Fourier transform basic operations with reduced implementation complexities. The first solution – a scheme for calculating sq product, the second solution – a scheme for calculating qt product, and the third solution – a scheme for calculating sqt product, where s is a so-called i -quaternion, t is an j - quaternion, and q – is an usual quaternion. The direct multiplication of two usual quaternions requires 16 real multiplications (or two-operand multipliers in the case of fully parallel hardware implementation) and 12 real additions (or binary adders). At the same time, our solutions allow to design the computation units, which consume only 6 multipliers plus 6 two input adders for implementation of sq or qt basic operations and 9 binary multipliers plus 6 two-input adders and 4 four-input adders for implementation of sqt basic operation.