Wyniki wyszukiwania - BazTech

1

Realization of multiplexer logic-based 2-D block firfilter using distributed arithmetic

Chowdari Ch. Pratyusha, Seventline J. Beatrice

Computer Assisted Methods in Engineering and Science

|

2023

|

Vol. 30, no. 1

89--103

EN

This paper presents a novel systolic two-dimensional (2D) block finite impulse response(FIR) filter architecture using a distributed arithmetic (DA)-based multiplexer look-uptable (DA-MUX-LUT). The proposed DA-MUX-LUT architecture computes the instan-taneous partial-product using the bit vector. The switching-based LUT replaces memory-based structures and reduces hardware complexity. Block processing allows memory reuse,which reduces the number of registers to store the previous input samples. Parallel addersare substituted by a modified carry look-ahead adder (MCLA), which minimizes the delay.Moreover, a resource-sharing concept is introduced to the DA-MUX-LUT block that drastically reduces the adder requirement. The application specific integrated circuit (ASIC)synthesis results show that the proposed DA-MUX-LUT-based 2-D block FIR filter forfilter size 8x8 and block size 4 has 31.22% less delay, 28.66% less area-delay product(ADP), 37.70% less power-delay product (PDP), and occupies almost the same area thanthe existing architecture.

2

High Performance DIF-FFT Using Dissimilar Partitioned LUT Based Distributed Arithmetic

Cheepurupalli Kusma Kumari, Charan Muntha, Rao Jammu Bhaskara, Noor Mahammad S.

International Journal of Electronics and Telecommunications

|

2021

|

Vol. 67, No. 4

631--637

EN

Real-time data processing systems utilize Digital Signal Processing (DSP) functions as the base modules. Most of the DSP functions involve the implementation of Fast Fourier Transform (FFT) to convert the signals from one domain to another domain. The major bottleneck of Decimation in frequency - Fast Fourier Transform (DIF-FFT) implementation lies in using a number of Multipliers. Distributed arithmetic (DA) is considered as one of the efficient techniques to implement DIF-FFT. In this approach, the multipliers are not used. The proposed technique exploits the very advantage of the look-up table by storing the Twiddle factors, thereby avoiding the multipliers required in the butterfly structure. DIF-FFT using Distributed Arithmetic (DIF-FFT DA) models, with different adders such as Ripple carry adder (RCA), Carry-lookahead adder (CLA), and Sklansky prefix graph adder, are proposed in this paper. The three proposed models are synthesized using Cadence 6.1 EDA tools with a 45nm CMOS technology. Compared to the traditional method, it is observed that the area is improved by 53.11%, 53.35%, and 50.15%, power is improved by 42.31%, 42.52%, and 40.39%, and delay is improved by 45.26%, 45.42%, 41.80%, respectively.

3

Application of Modified Distributed Arithmetic Concept in FIR Filter Implementations Targeted at Heterogeneous FPGAs

Staworko M., Rawski M.

Przegląd Elektrotechniczny

|

2012

|

R. 88, nr 6

240-246

EN

Distributed arithmetic is a very efficient method for implementing digital FIR filters in FPGA structures. In this approach general purpose multipliers of traditional MAC implementations are replaced by combinational LUT blocks. Since LUT blocks can be of considerable size thus, the quality of digital filter implementation highly depends on efficiency of logic synthesis algorithm that maps it into FPGA resources. Modern FPGAs have heterogeneous structure, there is a need for quality algorithms to target these structures and the need for flexible architecture exploration aiding in appropriate mapping. The paper presents an application of modified distributed arithmetic concept that allows for very efficient implementation of FIR filters in heterogeneous FPGA architectures.

PL

Arytmetyka rozproszona jest bardzo wydajną metodą implementacji filtrów SOI w układach FPGA. Pozwala na zastąpienie kosztowych układów mnożących tablicami prawdy (LUT). Dla filtrów wysokich rzędów tablice LUT osiągają wielkie rozmiary, dlatego jakość implementacji filtru zależy głównie od jakości dekompozycji tej tablicy. Artykuł przedstawia nową metodę dekompozycji tablic LUT filtrów SOI dedykowaną do heterogenicznych stukrur rekonfigurowalnych.

4

Modeling the Arithmetic Decomposition of DA-LUT Block for Heterogeneous FPGA Structures

Staworko M., Rawski M.

International Journal of Electronics and Telecommunications

|

2012

|

Vol. 58, No. 4

335-344

EN

Distributed arithmetic is well known technique of designing FIR filters in FPGA devices. The quality of such filter implementation strongly depends on synthesis results of the DALUT block. Heterogeneity of modern FPGA structures introduces new possibilities into implementation process, that may lead to better results, but also makes it more complicated. This paper presents the simple mathematical model for estimating the necessary FPGA resources to implement DA-LUT using decomposition-based approach. The model takes into account the type of logic cells or memory blocks used for decomposition process. The proposed model is help ful to determinate the DALUT decomposition strategy for further automation of modified distributed arithmetic decomposition method.

5

Modified Distributed Arithmetic Concept for Implementations Targeted at Heterogeneous FPGAs

Rawski M.

International Journal of Electronics and Telecommunications

|

2010

|

Vol. 56, No. 4

345-350

EN

Distributed Arithmetic (DA) plays an important role in designing digital signal processing modules for FPGA architectures. It allows replacing multiply-and-accumulate (MAC) operations with combinational blocks. The quality of implementations based on DA strongly depends on efficiency of methods that map combinational DA block into FPGA resources. Since modern FPGAs have heterogeneous structure, there is a need for quality algorithms to target these structures and the need for flexible architecture exploration aiding in appropriate mapping. The paper presents a modification of DA concept that allows for very efficient implementation in heterogeneous FPGA architectures.

6

Arytmetyka rozproszona w syntezie filtrów cyfrowych

Nowicka M., Tomaszewicz P., Zbierzchowski B.

Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne

|

2006

|

nr 1

7-11

PL

Omówiono zastosowanie nowych metod syntezy logicznej do projektowania filtrów cyfrowych w strukturach FPGA. Przedstawiono podstawy arytmetyki rozproszonej oraz metodę obliczania tablic LUT. Podano wyniki eksperymentów projektowych.

EN

The paper discusses the application of modern logic synthesis in designing of digital filters in FPGA architectures. Basic information concerning distributed arithmetic, as well as the method of computing LUT table description is presented. Experimental results are also shown.

7

O implementacjach sprzętowych transformacji falkowej

Rakowski W.

Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne

|

2003

|

nr 2-3

77-81

PL

Zawarto krótkie wprowadzenie do reprezentacji falkowej sygnałów. Opisano układ filtrów cyfrowych umożliwiający dekompozycję i rekonstrukcję falkową sygnałów. Podano algorytmy i schematy implementacji filtracji cyfrowej w szeregowej arytmetyce rozproszonej. Omówiono sposób dekompozycji tablic LUT oraz realizację decymacji sygnału i implementację filtrów symetrycznych.

EN

Paper contains a short introduction to the wavelet representation of signals and the two-channel filter bank allowing the wavelet decomposition and reconstruction. Efficient distributed arithmetic architectures of digital filtering suitable for FPGA implementation was derived and illustrated. A partitioning algorithm permitting to avoid a large lookup tables was described. Circuit structures implementing signal decimation and making use of filter symmetry was shown.

8

Implementacja operacji konwolucji o stałych współczynnikach w układach FPGA

Jamro E., Wiatr K.

Kwartalnik Elektroniki i Telekomunikacji

|

2003

|

Vol. 49, z. 3

295-315

PL

W artykule omówiono różne architektury układu konwolwera zoptymalizowane pod kątem implementacji w układach programowalnych FPGA. Współczynniki konwolucji są stałe, jednakże układ FPGA może być szybko przeprogramowany co pozwala na zmianę tych współczynników. Mnożenie jest najbardziej skomplikowaną operacją wykonywaną podczas obliczania operacji konwolucji, dlatego w pierszej części artykułu zostaną omówione układy mnożące. Niemniej rozbicie operacji mnożenia wewnątrz układu konwolwera pozwala na lepszą optymalizację operacji konwolucji, dlatego w artykule porównano dwie architektury: układ konwolwera zbudowany przy użyciu pamięci LUT (ang. Look-Up Table) i nazywany LC (ang. LUT-based Convolver) oraz układ konwolwera zbudowany jako suma mnożeń wykonywanych przy użyciu układów mnożących opartych o pamięć LUT i nazywanych LM (ang. LUT based Multiplier). Ponadto omówiona alternatywną technikę: konwolwer oparty na (równoległej) arytmetyce rozproszonej (ang. Parallel) i nazywaną DAC (ang. Distributed Arithmetic Convolver). Głównym celem tego artykułu jest przedstawienie nowej architektury układu konwolwera wykorzystującego nieregularną arytmetykę rozproszoną IDAC (ang. Irregular Distributed Arithetic Convolver), która to w porównaniu z architekturą DAC jest nieregularna, a przez to pozwala na lepszą optymalizację układu konwolwera. Wszystkie architektury konwelwera omówione w tym artykule mogą być automatycznie generowane przez autorskie narzędzie AuToCon.

EN

This paper reviews different architectural solutions for calculating constant coefficient convolution operation in FPGAs. At first, different architectures of multipliers are approached. Disregarding the multiplier entity allows for further circuit optimisations, therefore Look-Up-Table (LUT) based Convolver (LC) versus the sum of the LUT-based Multipliers (LM) is described. Further, an alternative technique - (Parallel) Distributed Arithmetic Convolver (DAC) is approached. The key issue of this paper is, however, a novel architectural solution: Irregular Distributed Arithmetic Convolver (IDAC) which, in comparison to the DAC, has an irregular form, and therefore allows for better circuit optimisation. All architectural solutions described hereby can be automatically generated by the Automated Tool for generation Convolvers in FPGAAs (AuToCon).

9

Optymalizacja drzewa dodającego implementowanego w układach FPGA z wykorzystaniem programowania genetycznego i "Simulated Annealing"

Wiatr K., Jamro E.

Kwartalnik Elektroniki i Telekomunikacji

|

2002

|

Vol. 48, z. 3/4

591-606

PL

Operacja dodawania jest podstawową operacją w realizacji wielu algorytmów przetwarzania danych (np. podczas obliczania operacji konwolucji - filtracji typu FIR o stałych współczynnikach). W układach FPGA (ang. Field Programmable Gate Arrays) operacja dodawania powinna być implementowana z wykorzystaniem układu dodającego z przeniesieniem skrośnym RCA (ang. Ripple Carry Adder), w porównaniu z układami ASIC, dla których optymalną architekturą jest układ dodający z przechowaniem przeniesienia CSA (ang. Carry Save Adder). W konsekwencji dla układów FPGA powinno się użyć innych metod optymalizacji drzewa dodającego niż dla układów ASIC. W artykule tym zostały przedstawione dwa takie algorytmy: programowanie genetyczne GP (ang. Genetic Programming) i Simulated Annealing SA (symulowane wyżarzanie). Algorytmy te zostały porównane z uprzednio użytymi metodami przeszukiwania wyczerpującego ES (ang. Exhaustive Search) oraz algorytmu zachłannego GrA (ang. Greedy Algorithm). W rezultacie wyniki otrzymane przez SA są lepsze niż dla GP oraz SA daje około 10÷20% oszczędności w porównaniu z GrA. Dlatego optymalnym rozwiązaniem jest użycie algorytmu ES dla liczby wejść do bloku dodającego N<8 oraz SA dla N>8. W przypadku gdy decydującym czynnikiem jest czas znalezienia optymalnego drzewa zalecany jest algorytm GrA.

EN

Addition is a very basic operation employed in numerous processes, e.g. constant coefficient FIR filters. In Field Programmable Gate Arrays (FPGAs), an addition should be carried out in the standard way employing ripple-carry adders, rather than carry-save adders as it is usually the case for ASICs. Consequently different adders optimisation techniques should be used in order to reduce area occupied by the adder tree. In this paper implementation of two different optimisation techniques: Genetic Programming (GP) and Simulated Annealing SA) are described. The implementation results of these techniques are compared to the previously published results for the Exhaustive Search (ES) and Greedy Algorithm (GrA). As a result, the SA usually outperforms the GP, and the SA gives about 10÷20% area reduction in comparison to the GrA. In conclusion, for the number of inputs to an adder tree N<8, the ES is the recommended algorithm as the number of possible combinations is usually acceptable, otherwise the SA should be employed. In the case when the time of finding the optimal adder tree is a critical factor, the GrA is recommended.

10

Implementacja układów dodających wchodzących w skład konwolwera w układach programowalnych FPGA

Wiatr K., Jamro E.

Kwartalnik Elektroniki i Telekomunikacji

|

2002

|

Vol. 48, z. 3/4

571-589

PL

Operacja dodawania jest podstawową operacją wykonywaną podczas obliczania operacji konwolucji (filtracji typu FIR) o stałych współczynnikach. W układach FPGA operacja dodawania powinna być implementowana z wykorzystaniem układu dodającego z przeniesieniem skrośnym RCA (ang. Ripple Carrry Adder), w porównaniu z układami ASIC, dla których optymalną architekturą jest układ dodający z przechowaniem przeniesienia CSA (ang. Carry Save Adder). W konsekwencji w niniejszym opracowaniu zostały przedstawione różne algorytmy znajdujące optymalną sieć połączeń w bloku dodającym: przeszukiwania wyczerpującego ES (ang. Exhaustive Search), algorytmu zachłannego GrA (ang. Greedy Algorithm). Ponadto zostały przedstawione różne architektury układu konwolwera w układach FPGA oraz ich wpływ na parametry wejściowe układu dodającego, w szczególności zakresu danych wejściowych (wartość minimalna i maksymalna) oraz korelacji pomiędzy wejściami.

EN

Addition is a fundamental operation for the constant coefficient convolutions (FIR filters). In FPGAs, addition should be carried out employing ripple-carry adders rather than carry-save adders as it is the case for ASIC designs. Therefore different adder optimisation techniques are required as a result Exhaustive Search and Greedy Algorithm have been implemented. Different convolver architectures and consequently different input parameters, e.g. input width, correlation between different inputs, are described.

11

Układy mnożące przez stały współczynnik implementowane w ukladach programowalnych FPGA

Wiatr K., Jamro E.

Kwartalnik Elektroniki i Telekomunikacji

|

2001

|

Vol. 47, z. 2

233-253

PL

Poniższy artykuł przedstawia różne architektury równoległe układów mnożących o stałym współczynniku mnożenia, implementowanych w układach programowalnych FPGA. W pierszej części artykułu zostały opisane układy mnożące bezmnożne MM (ang. Multiplierlees Multiplication). Uklady MM wykorzystują reprezentacje kanoniczną cyfry ze znakiem CSD (ang. Canonic Sign Digit) lub / i dzielnie wspólnej podstruktury SS (ang. Sub-structure Sharing). Opisany został również nowy, zoptymalizowany pod kątem generowanego układu MM algorytm konwersji z kodu uzupełnień do dwóch do reprezentacji CSD. Druga część artykułu została poświęcona układom mnożącym wykorzystującym pamięć typu LUT (ang. Look-Up Table) i nazywanym w skrócie LM (ang. LUT based Multiplication). W konsekwencji opisano wykorzystywanie różnych modułów pamięci oraz znajdowanie optymalnej kombinacji pamięć - układ dodający. Dla układów mnożących LM rozważona została równiez redukcja szerokości magistrali adresowej dla każdej komórki pamięci jak również możliwość dzielenia wspólnej pamięci dla komórek pamięci o tej samej zawartości. W ostatniej części artykułu podano wyniki implementacji dla układów firmy Xilinx serii XC4000 oraz Virtex.

EN

This paper investigates different architectures implementing bit-parallel constant coefficient multiplication in FPGA structures. At first the multiplierless multiplication (MM) architectures employing Canonic Sign Digit (CSD) and sub-structure sharing methods are addressed, and a novel algorithm for the conversion from two's complement to CSD is presented. In the second part of this paper the Look up table based Multiplication (LM) is investigated. Correspondingly, the usage of different memory modules and finding the optimal combination of the memory and adders are considered. The LM architecture consideres also reduction of the address width for each memory cell and the possibility of memory sub-structure sharing (the search for the same memory cells is implemented). Finally the implementation results for Xilinx XC4000 and Virtex families are presented. As a result, the MM generally suprasses the LM architecture, however the actual choice between these two architectures is coefficient and input parameters dependent.

12

Implementacja szybkich układów mnożących w strukturach FPGA

Wiatr K., Jamro E.

Kwartalnik Elektroniki i Telekomunikacji

|

2001

|

Vol. 47, z. 4

495-514

PL

Artykuł ten prezentuje różne rozwiązania szybkiego układu mnożącego implementowanego w układach reprogramowalnych FPGA. Przedstawione rozwiązania to: pełno-funkcjonalny układ mnożący o zmiennym współczynniku mnożenia VCM (ang. Variable Coefficient Multiplier), układ mnożący przez stały współczynnik KCM (ang. Constant Coefficient Multiplier) oraz rozwiązanie pośrednie - układ mnożący przez stały współczynnik z możliwością dynamicznej rekonfiguracji DKCM (ang. Dynamic Constant Coefficient Multiplier). Dla ukladów FPGA, które mogą być szybko przeprogramowane, wybór pomiędzy VCM i KCM jest trudnym zagadnieniem, któremu ten artykuł poświęca dużo uwagi. Co więcej istnieje rozwiązanie pośrednie - układ DKCM, który może być szybciej przeprogramowany niż KCM, ale zajmuje więcej zasobów układu FPGA. W układach FPGA wybór architektury układu mnożącego jest uzależniony od trzech czynników: zajmowanych zasobów, czasu propagacji oraz czasu przeprogramowania. W celu zwiększenia szybkości projektowania układu mnożącego zostało opracowane narzędzie do automatycznej generacji optymalnej architektury układu mnożącego w postaci kodu języka VHDL, na podstawie parametrów wejściowych.

EN

This paper studies different solutions for carrying out multiplication: a fully functional multiplier denoted as Variable Coefficient Multiplier (VCM), Constant Coefficient Multiplier (KCM) and self-configurable multiplier denoted as Dynamic Constant Coefficient Multiplier (DKCM). For FPGAs which can be easily reconfigured, the choice between the VCM and KCM cannot be easily defined. Furthermore, the DKCM is an additional, middle-way between the KCM and VCM solution, as it offers shorter reprogramming time but occupies more area in comparison with the KCM. ln FPGAs, the choice of the optimum multiplier involves three factors: area, propagation and reconfiguration time, which have been thoroughly studied and respective implementation results given. Furthermore, to speed-up implementation of multipliers a design-automated tool has been developed, which generates optimum (for given input parameters), VHDL description of multipliers.