Wyniki wyszukiwania - BazTech

1

Hardware-aware tiling optimization for multi-core systems

Adamski D., Jabłoński G.

Computer Science

|

2017

|

Vol. 18 (2)

145--162

EN

This paper presents a proposal for a new tool that improves tiling efficiency for a given hardware architecture. This article also describes the correlation between the changing hardware architecture and methods of software optimization. The first chapter includes a short description of the change in hardware architecture that has occurred over the past ten years. The second chapter provides an overview of the tools that will be used in further research. The subsequent sections contain a description of the proposed hardware-aware tool for optimal tiling.

2

Dynamic tile free scheduling for code with acyclic inter-tile dependence graphs

Bielecki W., Skotnicki P.

Computer Science

|

2017

|

Vol. 18 (2)

195--216

EN

Free scheduling is a task ordering technique under which instructions are executed as soon as their operands become available. Coarsening the grain of computations under the free schedule, by means of using groups of loop nest statement instances (tiles) in place of single statement instances, increases the locality of data accesses and reduces the number of synchronization events, and as a consequence improves program performance. The paper presents an approach for code generation that allows for the free schedule for tiles of arbitrarily nested affine loops at run-time. The scope of the applicability of the introduced algorithms is limited to tiled loop nests whose inter-tile dependence graphs are cycle-free. The approach is based on the polyhedral model. Results of experiments with the PolyBench benchmark suite, demonstrating significant tiled code speed-up, are discussed.

3

Obliczeniowe szacowanie czasu wykonania programu

Kamińska A.

Pomiary Automatyka Kontrola

|

2012

|

R. 58, nr 2

193-195

PL

Określenie czasu wykonywania programu poprzez jego uruchomienie nie zawsze jest możliwe w zagadnieniach praktycznych, przykładowo w kompilacji iteracyjnej, ze względu na duże wydłużenie czasu tworzenia oprogramowania. Jednakże w wielu sytuacjach nie ma potrzeby dokładnego określenia tego czasu; wystarczyłoby go oszacować. W niniejszym artykule przedstawiono propozycję sposobu obliczeniowego szacowania czasu wykonania programu w oparciu o samą postać jego kodu źródłowego i znane parametry środowiska sprzętowego.

EN

The program execution time is one of criteria which are taken into account during assessment of widely comprehended software quality. The general purpose is to make program execution time as short as possible. The program execution time depends on many, very different, factors. The most obvious of these are: the form of its source code and the hardware environment in which the program is executed. In practice, even a very minor change in the form of the source code of a program can result in a significant change in its execution time. The same effect can be caused by a slight change in the values of hardware parameters. Although the interpretation of program execution time as a quality assessment criterion is very simple, it is sometimes very difficult to precisely measure program execution and taking necessary measurements requires running the program. However, there is very often no need to know this time precisely; it would be sufficient to estimate it with some error which is known in advance. The paper presents - using the matrix multiplication problem for reference - a proposal of a method which can be used for estimating the execution time of a program, based only on its source code and a priori known hardware parameters. The idea of the proposed method is to elaborate a mathematical model combining statistical approach and the Wolfe's method for calculating data locality. The paper discusses the results of using the elaborated model on a control sample and indicates directions of further works.

4

Obliczeniowe szacowanie lokalności danych dla programów ANSI-C

Kraska K., Wierciński T., Kamińska A.

Pomiary Automatyka Kontrola

|

2011

|

R. 57, nr 8

951-953

PL

Krytycznym czynnikiem warunkującym wydajność obliczeniową oprogramowania jest lokalność dostępu do danych. Dlatego oczekuje się od narzędzi kompilacji automatyzacji procesu przekształcenia nieoptymalnego kodu do postaci charakteryzującej się wysoką lokalnością danych. W artykule przedstawiono podejście pozwalające na oszacowanie lokalności danych programów na podstawie kodu źródłowego w języku ANSI-C. Omówiono wyniki przeprowadzonych badań eksperymentalnych oraz wskazano kierunki dalszych prac.

EN

Good data locality, comprehended as such placement of program data in memory that program data requested by the processor are available immediately on demand, is a critical software requirement for achieving high efficiency in data processing. One of the ways to achieve good data locality is to transform source codes at the compilation stage so as to improve their usage of the cache memory and, thus, fully benefit from the concept of memory hierarchy. Modern compilers are expected to carry out this kind of optimization automatically, by adopting relevant transformations. In order to select the transformation which is best for this purpose for a given source code, the compiler should be able to compare, from this point of view, the available transformations and indicate the one that produces a semantically identical code of the shortest execution time possible. The paper briefly describes Wolfe's method of esti-mating data locality based on calculations carried out directly on the source code under analysis, without any need to carry out time consuming compilation of the source code to its executable form and to collect memory access metrics at run time. The paper also presents in outline how the authors implemented in C++ a software module estimating data locality for ANSI-C source codes based on Wolfe's method. The paper discusses the results of adopting the proposed approach to some selected source codes and indicates directions of further works.

5

Obliczeniowe szacowanie lokalności danych na poziomie pamięci podręcznej

Kamińska A., Bielecki W.

Metody Informatyki Stosowanej

|

2011

|

nr 4

33-44

EN

In order to effectively use cache memory, it is essential to ensure good data locality at the cache memory level. This can be achieved by appropriately transforming the source code of a program to a semantically equivalent form. The problem is, however, how – based only on the form of the source code of a program – to assess the data locality it involves and apply this assessment for selection of the source code of the shortest execution time. The paper presents Wolfe’s method of estimating data locality and - using the matrix multiplication problem for reference – discusses the possibilities of applying Wolfe’s method for the purpose of estimating the program execution time. The paper also presents software prepared by the authors and dedicated for estimating data locality.

6

Model obliczeniowego szacowania czasu wykonania programu

Kamińska A.

Metody Informatyki Stosowanej

|

2011

|

nr 4

125-134

EN

Program execution time is one of the criteria taken into account during assessment of software quality. It is sometimes very difficult to precisely measure this time and carrying out necessary measurements requires running the program. However, there is very often no need to know this time precisely; it would be sufficient to estimate it with some error known in advance. The paper presents the proposal and assumptions of a model for estimating the time of execution of a program, based only on its source code. The paper introduces a sample statistical model which can be used for this purpose. It was created based on empirical data collected for the matrix multiplication problem. The paper also presents an analysis of possibilities of applying the above-mentioned statistical model to some other programs.

7

Experimental study on data locality of parallel programs executing synchronization-free threads of computations

Kraska K., Siedlecki K.

Pomiary Automatyka Kontrola

|

2010

|

R. 56, nr 12

1504-1508

EN

The effective use of hierarchical memory for parallel shared memory programs requires good data locality. Analysis and experimental study on data locality in L1D cache for parallel programs executing synchronization-free threads of computations, derived from NAS Parallel Benchmarks, are presented in the paper. Parallel synchronization-free programs were implemented by means of the OpenMP standard. Experiments were carried out in the Intel SMP architecture. The Intel VTune Performance Analyzer was used to collect and evaluate data locality metrics. Finally, a few conclusions about data locality characteristics of synchronization-free parallel programs are given.

PL

Efektywne wykorzystanie współczesnych wieloprocesorowych architektur z pamięcią dzieloną, stosujących kilkupoziomową hierarchię dostępu do danych, wymaga od programów wykonujących równolegle obliczenia w niezależnych wątkach dobrych charakterystyk lokalności danych. W niniejszym artykule przedstawiono badania eksperymentalne oraz analizę lokalności danych dla programów zaczerpniętych ze standardowego zestawu testowego NAS Parallel Benchmark, wykonujących obliczenia w niezależnych wątkach utworzonych przy użyciu dyrektyw równoległych standardu OpenMP. Charakterystyki lokalności danych zostały opracowane dla pierwszego poziomu danych (L1D) pamięci cache. Całość badań została wykonana na architekturze Intel SMP z systemem operacyjnym Linux. W celu pozyskania wartości metryk umożliwiających oszacowanie lokalności danych zastosowano narzędzie Intel VTune Performance Analyzer. Na podstawie uzyskanych obserwacji podjęto próbę sformułowania wniosków końcowych.

8

Koncepcja metody zwiększania lokalności danych na poziomie pamięci podręcznej oparta na transformacjach pętli programowych

Kraska K., Kamińska A.

Metody Informatyki Stosowanej

|

2010

|

nr 2 (23)

63-72

PL

W artykule omówiono problem lokalności danych oraz zaprezentowano istniejące techniki zwiększania lokalności danych polegające na transformacji kodu zródłowego pętli w celu lepszego wykorzystania możliwości pamięci podręcznej procesora. Zaprezentowano również koncepcję metody zwiększania lokalności danych na poziomie pamięci podręcznej opartej na znanych transformacjach pętli programowych oraz obliczeniowo-doświadczalnej analizie metryk lokalności danych. Przedstawiono model koncepcyjny modułu programowego implementujacego uzyskiwane wyniki badań.

EN

This paper presents in outline the idea of hierarchical organization of memory, focusing on cache memory. It also discusses in brief popular software techniques and approaches which can be used in order to more greatly benefit from the specific nature and potential of cache memory. In this context, one presents herein the conception of a new method for shortening the execution time of various executable programs. The new method aims at increasing data locality at the cache memory level, based on transforms of program loops. A proposal of applying the new method in practice is described herein as well.

9

Zastosowanie Intel® VTune™ Performance Analyzer do badania lokalności danych aplikacji równoległych opartych na tworzeniu niezależnych wątków obliczeń

Kraska K.

Metody Informatyki Stosowanej

|

2009

|

nr 1 (18)

45-52

PL

W artykule zostało zaprezentowane narzędzie Intel® VTune™. Performance Analyzer umożliwiające pozyskiwanie i gromadzenie metryk lokalności danych aplikacji oraz jego zastosowanie do realizacji badań nad lokalnością danych programów równoległych opartych na tworzeniu niezależnych wątków obliczeń. Badania nad lokalnością danych stanowią element pracy naukowej nad metodami i algorytmami tworzenia kompilatorów równoległych prowadzonej w Katedrze Inżynierii Oprogramowania Wydziału Informatyki Zachodniopomorskiego Uniwersytetu Technologicznego z wykorzystaniem infrastruktury nowopowstałego laboratorium HPC (ang. High Performance Computing). Zastosowanie narzędzia zostało zademonstrowane na przykładzie analizy pętli programowej zawartej w UA Benchmark z zestawu NAS Parallel Benchmarks 3.2.

EN

A well-known way to speed up computations is parallelizing programs and executing them on multiprocessors. An innovative approach for extracting parallel synchronization-free threads of computations for program loops was presented in [1]. However, parallel programs representing synchronization-free threads of computations require good data locality in order to achieve an effective usage of the hierarchy of memory. Data locality of a program can be estimated based on the metrics collected from software analysis tools widespread available on the market. The usage of the modern software analysis tool Intel® VTune™ Performance Analyzer to collecting and evaluating data locality metrics is presented in the paper. An experimental parallel program running synchronization-free threads of computations, implemented in C++, assigned to parallel threads by means of OpenMP directives and executed on a target Intel SMP architecture was taken to demonstrate practical analysis based on the discussed software analysis tool.

10

Badania lokalności aplikacji równoległych bazujących na tworzeniu niezależnych wątków obliczeń

Bielecki W., Kraska K.

Elektronika : konstrukcje, technologie, zastosowania

|

2009

|

Vol. 50, nr 6

74-81

PL

Efektywne wykorzystanie hierarchii pamięci wymaga od programów równolegle przetwarzających wydzielone sekwencje operacji dobrej lokalności danych. W artykule przedstawiono analizę i badania eksperymentalne lokalności danych L1D Cache dla trzech wybranych przypadków programów, w których przy użyciu metody wyznaczania niezależnych wątków obliczeń [1] zostały wydzielone niezależne wątki obliczeń, przetwarzane w pętlach programowych. Rozważane przypadki zostały zaimplementowane w języku C++, przydzielone do równoległych wątków za pomocą dyrektyw OpenMP i wykonane na docelowej architekturze Intel SMP. Zaprezentowano zastosowanie programowego analizatora wydajności Intel® Vtune™ Performance Analyzer do zgromadzenia metryk i oceny lokalności danych programów równoległych. Na podstawie uzyskanych wyników wyprowadzono zalecenia dla programistów, aby tworzone przez nich oprogramowanie cechowała dobra lokalność danych.

EN

The effective use of hierarchical memory for parallel programs performing computations in slices requires good data locality. Analysis and experimental studies on data locality in L1D Cache for three selected cases of parallel programs representing synchronization-free threads of computations extracted by means on the method described in [1], are presented in the paper. The considered cases we re implemented in C++, assigned to parallel threads by means of OpenMP directives and executed on a target Intel SMP architecture. The usage of the software analysis tool Intel® VTune™ Performance Analyzer to collecting and evaluating data locality metrics is presented. Finally, recommendations for software developers are concluded to develop numerical applications with good data locality metrics.

11

Zwiększenie lokalności programów równoległych wykonywanych w systemach osadzonych

Bielecki W., Kraska K.

Pomiary Automatyka Kontrola

|

2008

|

R. 54, nr 8

464-468

PL

Zwiększenie lokalności danych w programie jest niezbędnym elementem zwiększenia wydajności części programowych systemu osadzonego, zmniejszenia zużycia energii oraz redukcji rozmiaru pamięci w układzie. Przedstawiono komplementarne wykorzystanie metody szacowania lokalności danych wobec nowej metody ekstrakcji wątków, ich aglomeracji w celu dostosowania do możliwości docelowej architektury przy zastosowaniu różnych typów podziału iteracji pętli (mapowanie czasowo-przestrzenne) i z uwzględnieniem wpływu zastosowania znanych technik poprawy lokalności danych. Wybór najlepszej kombinacji transformacji kodu pod kątem lokalności danych umożliwia zwiększenie wydajności programu względem wskazanych czynników. Zaprezentowano podejście do analizy lokalności danych dla wybranych pętli, przedstawiono i omówiono wyniki badań eksperymentalnych a także wskazano kierunki dalszych prac.

EN

Increasing data locality in a program is a necessary factor to improve performance of software parts of embedded systems, to decrease power consumption and reduce memory on chip size. A possibility of applying a method of quantifying data locality to a novel method of extracting synchronization-free threads is introduced. It can be used to agglomerate extracted synchronization-free threads for adopting a parallel program to a target architecture of an embedded system under various loop schedule options (space-time mapping) and the influence of well known techniques to improve data locality. The choice of the best combination of loop transformation techniques regarding to data locality makes possible improving program performance. A way of an analysis of data locality is presented. Experimental results are depicted and discussed. Conclusion and future research are outlined.

12

Increasing data locality of parallel programs executed in embedded systems

Bielecki W., Kraska K.

Metody Informatyki Stosowanej

|

2008

|

nr 4 (Tom 17)

5--13

EN

Increasing data locality in a program is a necessary factor to improve performance of oftware parts of embedded systems, to decrease power consumption and reduce memory on chip size. A possibility of applying a method of quantifying data locality to a novel method of extracting synchronization-free threads is introduced. It can be used to agglomerate extracted synchronization-free threads for adopting a parallel program to a target architecture of an embedded system under various loop schedule options (spacetime mapping) and the influence of well-known techniques to improve data locality. The choice of the best combination of loop transformation techniques regarding to data locality makes possible improving program performance. A way of an analysis of data locality is presented. Experimental results are depicted and discussed. Conclusion and future research are outlined.

13

Zwiększenie wydajności aplikacji wykonywanych w systemach osadzonych poprzez zwiększenie lokalności danych

Bielecki W., Kraska K.

Pomiary Automatyka Kontrola

|

2007

|

R. 53, nr 7

86-88

PL

Efektywne użycie pamięci jest krytycznym warunkiem uzyskania wysokiej wydajności przez oprogramowanie wykonywane na współczesnych architekturach z hierarchią pamięci. W systemach osadzonych efektywne wykorzystanie pamięci przez aplikacje umożliwia przede wszystkim zmniejszenie wymagań dla sprzętu przy ustalonych kryteriach wydajnościowych, redukcję rozmiaru pamięci jak i zmniejszenie zużycia energii. Wskazane czynniki bezpośrednio wpływają na koszt budowy systemu osadzonego. Osiągnięcie wysokiego poziomu efektywności użycia pamięci wymaga tworzenia oprogramowania uwzględniającego lokalność danych. Oprogramowanie intensywnie eksploatujące pamięć, takie jak chociażby aplikacje multimedialne, zazwyczaj przetwarza w pętlach programowych znaczne ilości danych umieszczonych w tablicach. Sposo-bem na zwiększenie lokalności takich programów jest transformacja pętli programowych do postaci bardziej optymalnego kodu. W artykule przedstawiono aktualny stan badań w zakresie metod transformacji programów zwiększając.

EN

The effective use of memory subsystem is the critical condition for software to achieve the high performance on the contemporary architectures with hierarchy of memory. In embedded systems the effective utilization of the memory subsystem mainly enables to decrease requirements for hardware with respect to established performance criteria, reduce the size of memory and decrease the energy consumption. The indicated factors influence on cost of building an embedded system directly. The achievement of high efficiency of memory subsystem requires creating of software with high data locality. Software that intensely explores memory, such as multimedia applications, usually processes within program loops considerable quantities of data placed in arrays. The transformation of program loops to more optimal code is the way on improvement data locality. In the paper, the state of the art of loop transformation methods improving data locality was presented. Additionally, the possibility of estimating a level of data loality and improving data locality for perfectly nested loop were examined. Finally, the results of analysis investigations were introduced illustrating the efficiency of considered transformations.

14

Zwiększenie lokalności programów wykonywanych w komputerach równoległych

Bielecki W., Kraska K.

Metody Informatyki Stosowanej

|

2007

|

nr 2(Tom 12)

15-25

EN

Increasing data locality in a program is a necessary factor to decrease its execution time. A possibility of using a method of quantifying data locality to a new method of extracting parallel threads is introduced. It can be used to the agglomeration of extracted synchronization-free threads to adopt a parallel program to a target architecture of a parallel computer under various loop schedule options (space-time mapping) and the influence of well known techniques to improve, data locality. An analysis of data locality for two loops is presented. Experimental results are discussed. Conclusion and future research are outlined.