Wyniki wyszukiwania - BazTech

Ograniczanie wyników

Znaleziono wyników: 2

Liczba wyników na stronie

Wyniki wyszukiwania

Wyszukiwano:
w słowach kluczowych: HPRC (High Performance Reconfigurable Computing)

Sortuj według:

Ogranicz wyniki do:

Prototyp systemu profilowania pętli kodu źródłowego jako narzędzia analizy kodu w celu efektywnego przyspieszenia obliczeń wielkiej skali

Pietroń M., Russek P., Wiatr K.

Automatyka / Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie

2010

T. 14, z. 3/2

925-938

Praca przedstawia badania nad metodologią przyspieszania aplikacji HPC na platformach HPRC (platformy HPC z układami FPGA). Najważniejszym zagadnieniem jest selekcja kodu źródłowego, który mógłby zostać przyspieszony. Największym utrudnieniem jest brak odpowiedniego narzędzia wspomagającego ten proces. Aplikacje HPC składają się z ogromnej ilości bardzo złożonego kodu źródłowego. Powoduje to, że niezbędny jest system automatycznej analizy kodu. Dodatkowo powstające języki wysokiego poziomu (HLL) do implementacji algorytmów w FPGA ułatwiają automatyzację transformacji i implementacji wybranego kodu w FPGA. Profiling pętli w kodzie źródłowym jest jednym z głównych kroków, który umożliwia sprawdzenie, czy dana aplikacja HPC jest możliwa do przyspieszenia w układach FPGA. Oprócz selekcji najbardziej czasochłonnych części kodu istotna jest także analiza danych wykorzystywanych w trakcie obliczeń. Przede wszystkim zależności między danymi i ich ilość odgrywa zasadnicze znaczenie. Dzięki tej informacji można optymalnie implementować algorytmy przez minimalizację częstotliwości komunikacji między CPU a układem FPGA.

This paper presents the research on FPGA based acceleration of HPC applications. The most important step to achieve this goal is to extract code that can be sped up. A major drawback is the lack of a tool which could do it. The HPC applications usually consist of a huge amount of complex source code. This is one of the reasons why the process of acceleration should be as automated as possible. Another reason is to make use of HLL (High Level Languages) such as Mitrion-C and Impulse-C. Loop profiling is one of the steps to check if the insertion of HLL to existing HPC source code is possible to gain acceleration of these applications. Hence the most important step to achieve acceleration is to extract the most time consuming code and data dependency, which makes the code easier to be pipelined and parallelized. Data dependency also gives information on how to implement algorithms in an FPGA circuit with the minimal initialization of it during the execution of algorithms.

Akceleracja obliczeń zmiennoprzecinkowych na platformie RASC

Wielgosz M., Jamro E., Wiatr K.

Pomiary Automatyka Kontrola

2009

R. 55, nr 7

485-487

W artykule zostały zaprezentowane wyniki testów przeprowadzonych w celu określenia maksymalnej szybkości wykonywania operacji zmiennoprzecinkowych na platformie rekonfigurowanej RASC. Zaimplementowano różne dostępne tryby konfiguracji jednostki Host oraz RASC w celu wyłonienia najbardziej efektywnego pod względem wydajności trybu pracy jednostki obliczeniowej. Uzyskane wyniki pomiarów ujawniały, że kombinacja Direct I/O oraz DMA zapewnia najwyższą przepustowość pomiędzy węzłami Host i RASC. Niemniej jednak dla niektórych aplikacji tryb multi-buffering może okazać się bardziej odpowiedni, ze względu na możliwość jednoczesnego przesyłania danych i wykonywania operacji. Funkcja exp() w standardzie zmiennoprzecinkowym o podwójnej precyzji została wykorzystana jako przykładowa aplikacja, która pozwoliła oszacowanie możliwej do uzyskania akceleracji obliczeń na platformie RASC.

This paper presents results of the tests performed to determine high speed calculations capabilities of the SGI RASC platform. Different data transfer modes and memory management approaches were examined to choose the most effective combination of the Host and RASC memory adjustments. That work may be regarded as a case study of the contemporary FPGA -based accelerator which, however, can characterize the whole branch of the devices. The paper is strongly focused on the floating point calculations potential of the FPGA accelerator. The RASC algorithm execution procedure, from the processor perspective, is composed of several functions which reserve resources, queue commands and perform other preparation steps. It is noteworthy (Fig. 3) that the time consumed by the functions remains roughly the same, independent of the algorithm being executed. The resource reservation procedure, once conducted, allows many executions of the algorithm -that amounts to huge time savings, since the procedure takes approximately 7.5 ms, which is roughly 99 % of the overall execution time of the algorithm. Rasclib algorithm commit and rasclib algorithm wait calls are considered to be the key (Fig. 3) part of the RASC software execution routine. The first one activates the FPGA between these two commands is the transfer and algorithm execution time. All curves (Fig. 4) reflect overall processing time of the same amount of data, but differ in size of the single data chunk which varies from 1024x64 bit = 8 kB to 1048576x64 bit = 8 MB. It has been observed that for the bigger chunk much better results are achieved in terms of the effective execution time. However, above 1 MB a decrease of the effective execution time seems to indicate saturation, therefore sending data in bigger portions may not improve the performance of the system so much. The most effective execution time of single exp() function for SRAM buffering mode is 12 ns, so 9,5 ns is transport overhead due to bus delays. The theoretical calculation time of single exp() function (data transfer is not taken into account) is 2,5 ns because two exp() are implemented on the RASC and clocked at 200 Mhz. The obtained measurement results show that Direct I/O mode together with DMA transfer provides the highest data throughput between the Host and RASC slice. Nevertheless, for some application multi-buffering can appear to be more suitable in terms of concurrent data transfer capabilities and FPGA algorithm execution. As a hardware acceleration example, there is considered an exponential function which allows estimating maximum achievable data processing speed.