Wyniki wyszukiwania - BazTech

1

Sprzętowa implementacja funkcji orbitalnej na potrzeby obliczeń kwantowo-chemicznych

Wielgosz M., Jamro E., Russek P., Wiatr K.

Pomiary Automatyka Kontrola

|

2010

|

R. 56, nr 7

705-707

PL

W niniejszym artykule przedstawione zostały wyniki implementacji modułu obliczającego wartość orbitalu atomowego w punkcie. Moduł ten stanowił cześć składową jednostki generującej wartość potencjału korelacyjno-wymiennego, wykorzystywaną w obliczeniach kwantowo-chemicznych. Prezentowana jednostka składa się z potokowych bloków zmiennoprzecinkowych. W pracy zaprezentowano również wyniki akceleracji obliczeń względem procesora ogólnego przeznaczenia Itanium2 1.6 GHz.

EN

The paper presents FPGA acceleration and implementation results of the orbital function calculation employed in quantum-chemistry. The orbital function core is composed of the authors' customized floating-point hardware modules. These modules are scalable from single to double precision, capable of working at frequency ranging from 100 to 200 MHz. Besides hardware implementation, the design process also involved reformulation of the algorithm in order to adapt them to the platform profile. The computational procedure presented in this paper is part of the algorithm for generating exchange-correlation potential, and is also recognized as one of the most computationally intensive routines. This feature justifies the effort devoted to develop its hardware implementation. The precision of floating-point operations becomes a primary concern when dealing with low-level quantum chemistry procedures, thus the authors have taken various measures to optimize them, both in terms of resource consumption and processing speed.

2

Akceleracja obliczeń zmiennoprzecinkowych na platformie RASC

Wielgosz M., Jamro E., Wiatr K.

Pomiary Automatyka Kontrola

|

2009

|

R. 55, nr 7

485-487

PL

W artykule zostały zaprezentowane wyniki testów przeprowadzonych w celu określenia maksymalnej szybkości wykonywania operacji zmiennoprzecinkowych na platformie rekonfigurowanej RASC. Zaimplementowano różne dostępne tryby konfiguracji jednostki Host oraz RASC w celu wyłonienia najbardziej efektywnego pod względem wydajności trybu pracy jednostki obliczeniowej. Uzyskane wyniki pomiarów ujawniały, że kombinacja Direct I/O oraz DMA zapewnia najwyższą przepustowość pomiędzy węzłami Host i RASC. Niemniej jednak dla niektórych aplikacji tryb multi-buffering może okazać się bardziej odpowiedni, ze względu na możliwość jednoczesnego przesyłania danych i wykonywania operacji. Funkcja exp() w standardzie zmiennoprzecinkowym o podwójnej precyzji została wykorzystana jako przykładowa aplikacja, która pozwoliła oszacowanie możliwej do uzyskania akceleracji obliczeń na platformie RASC.

EN

This paper presents results of the tests performed to determine high speed calculations capabilities of the SGI RASC platform. Different data transfer modes and memory management approaches were examined to choose the most effective combination of the Host and RASC memory adjustments. That work may be regarded as a case study of the contemporary FPGA -based accelerator which, however, can characterize the whole branch of the devices. The paper is strongly focused on the floating point calculations potential of the FPGA accelerator. The RASC algorithm execution procedure, from the processor perspective, is composed of several functions which reserve resources, queue commands and perform other preparation steps. It is noteworthy (Fig. 3) that the time consumed by the functions remains roughly the same, independent of the algorithm being executed. The resource reservation procedure, once conducted, allows many executions of the algorithm -that amounts to huge time savings, since the procedure takes approximately 7.5 ms, which is roughly 99 % of the overall execution time of the algorithm. Rasclib algorithm commit and rasclib algorithm wait calls are considered to be the key (Fig. 3) part of the RASC software execution routine. The first one activates the FPGA between these two commands is the transfer and algorithm execution time. All curves (Fig. 4) reflect overall processing time of the same amount of data, but differ in size of the single data chunk which varies from 1024x64 bit = 8 kB to 1048576x64 bit = 8 MB. It has been observed that for the bigger chunk much better results are achieved in terms of the effective execution time. However, above 1 MB a decrease of the effective execution time seems to indicate saturation, therefore sending data in bigger portions may not improve the performance of the system so much. The most effective execution time of single exp() function for SRAM buffering mode is 12 ns, so 9,5 ns is transport overhead due to bus delays. The theoretical calculation time of single exp() function (data transfer is not taken into account) is 2,5 ns because two exp() are implemented on the RASC and clocked at 200 Mhz. The obtained measurement results show that Direct I/O mode together with DMA transfer provides the highest data throughput between the Host and RASC slice. Nevertheless, for some application multi-buffering can appear to be more suitable in terms of concurrent data transfer capabilities and FPGA algorithm execution. As a hardware acceleration example, there is considered an exponential function which allows estimating maximum achievable data processing speed.

3

Implementacja w układach FPGA modułu obliczającego funkcję jednoelektronową

Wielgosz M., Jamro E., Wiatr K.

Automatyka / Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie

|

2009

|

T. 13, z. 3/1

1043-1050

PL

W artykule przedstawione zostały wyniki implementacji modułu obliczającego część eksponencjalną orbitalu atomowego (funkcję jednoelektoronową). Generowanie funkcji jednoelektrodowych jest jednym z najbardziej wymagających obliczeniowo fragmentów procedury DFT. Dlatego autorzy pracy postanowili wykorzystać układy FPGA do akceleracji wspomnianego algorytmu. Moduł sprzętowy został zaimplementowany na platformie SGI RASC w układzie FPGA serii Virtex-4 LX200. Składa się on z szeregu jednostek zmiennoprzecinkowych zaprojektowanych tak, by mogły pracować w sposób potokowy z częstotliwością sięgającą 200 MHz. Wstępnie przeprowadzone testy wykazały, że uzyskuje się przyspieszenie rzędu 5x względem analogicznych obliczeń prowadzonych na procesorze Intel Itanium 2 1.6 GHz. Należy zaznaczyć, że uzyskiwane przyspieszanie jest limitowane przez ograniczenia platformy (szerokości interfejsu komunikacyjnego).

EN

This paper presents an FPGA implementation of a finite sum of the exponential products (orbital function) calculation module. The module is composed of several units. All of them are specially designed, fully pipelined floating-point modules optimized for high speed performance, up to 200 MHz. Execution results revealed speed-up of 5x for the finite sum of the exponential products comparing to Intel Itanium 2 1.6 processor. Orbital function is a computationally critical part of the Hartree-Fock algorithm. Therefore an approach presented here aims to increase the performance of the whole quantum chemistry computational system by extending it with FPGA-based accelerator which is composed of two Xilinx Virtex-4 LX200 chips. It is worth underlining that achieved speed-up is limited by an external memory width constrain. Thus it can be expected that in foreseeable future introduction of next generation of FPGA-based accelerators will allow to increase the speed-up by just porting a project to them without adoption of any changes in the module's architecture.