Wyniki wyszukiwania - BazTech

1

Execution time prediction model for parallel GPU realizations of discrete transforms computation algorithms

Puchala Dariusz, Stokfiszewski Kamil, Wieloch Kamil

Bulletin of the Polish Academy of Sciences. Technical Sciences

|

2022

|

Vol. 70, nr 1

art. no. e139393

EN

Parallel realizations of discrete transforms (DTs) computation algorithms (DTCAs) performed on graphics processing units (GPUs) play a significant role in many modern data processing methods utilized in numerous areas of human activity. In this paper the authors propose a novel execution time prediction model, which allows for accurate and rapid estimation of execution times of various kinds of structurally different DTCAs performed on GPUs of distinct architectures, without the necessity of conducting the actual experiments on physical hardware. The model can serve as a guide for the system analyst in making the optimal choice of the GPU hardware solution for a given computational task involving particular DT calculation, or can help in choosing the best appropriate parallel implementation of the selected DT, given the limitations imposed by available hardware. Restricting the model to exhaustively adhere only to the key common features of DTCAs enables the authors to significantly simplify its structure, leading consequently to its design as a hybrid, analytically–simulational method, exploiting jointly the main advantages of both of the mentioned techniques, namely: time-effectiveness and high prediction accuracy, while, at the same time, causing mutual elimination of the major weaknesses of both of the specified approaches within the proposed solution. The model is validated experimentally on two structurally different parallel methods of discrete wavelet transform (DWT) computation, i.e. the direct convolutionbased and lattice structure-based schemes, by comparing its prediction results with the actual measurements taken for 6 different graphics cards, representing a fairly broad spectrum of GPUs compute architectures. Experimental results reveal the overall average execution time and prediction accuracy of the model to be at a level of 97.2%, with global maximum prediction error of 14.5%, recorded throughout all the conducted experiments, maintaining at the same time high average evaluation speed of 3.5 ms for single simulation duration. The results facilitate inferring the model generality and possibility of extrapolation to other DTCAs and different GPU architectures, which along with the proposed model straightforwardness, time-effectiveness and ease of practical application, makes it, in the authors’ opinion, a very interesting alternative to the related existing solutions.

2

Modified local ternary patterns technique for brain tumour segmentation and volume estimation from MRI multi-sequence scans with GPU CUDA machine

Sriramakrishnan Padmanaban, Kalaiselvi Thiruvenkadam, Rajeswaran Rangasami

Biocybernetics and Biomedical Engineering

|

2019

|

Vol. 39, no. 2

470--487

EN

The proposed work develops a rapid and automatic method for brain tumour detection and segmentation using multi-sequence magnetic resonance imaging (MRI) datasets available from BraTS. The proposed method consists of three phases: tumourous slice detection, tumour extraction and tumour substructures segmentation. In phase 1, feature blocks and SVM classifier are used to classify the MRI slices into normal or tumourous. Phase 2 contains fuzzy c means (FCM) algorithm to extract the tumour region from slices identified by phase 1. In addition, graphics processing unit (GPU) based FCM method has been implemented for reducing the processing time which is major overhead with FCM processing of MRI volumes. For phase 3, a novel probabilistic local ternary patterns (PLTP) technique is used to segment the tumour substructures based on the probability density value of histogram bins. Quantitative measures such as sensitivity, specificity, accuracy and dice values are used to analyses the performance of the proposed method and compare with state-of-art-methods. As post processing, the tumour volume estimation and 3D visualization were done for analyzing the nature and location of the tumour to the medical experts. Further, the availability of the GPU reduces the processing time up to 18 than serial CPU processing.

3

Fast ray casting of function-based surfaces

Vyatkin S. I., Romanyuk A. N., Pavlov S. V., Moskovko M. V., Askarova N., Sagymbekova A., Wójcik W., Kotyra A.

Przegląd Elektrotechniczny

|

2017

|

R. 93, nr 5

83--86

EN

This paper deals with the fast ray casting of high-quality images, a method of defining free forms without approximating them with polygons or patches, issues of using perturbation functions for animation of the surfaces of 3D objects. A method for visualizing functionally defined objects adapted for graphics processing units (GPU) is proposed.

PL

W artykule zaprezentowano metodę rzutowania promieni wykorzystującą tzw. funkcję perturbacji zamiast aproksymacji za pomocą wieloboków w odniesieniu do obrazów wysokiej rozdzielczości w celu animacji powierzchni obiektów trójwymiarowych. Ponadto, zaproponowana została metoda wizualizacji obiektów z wykorzystaniem procesorów graficznych (GPU).

4

Rough assessment of GPU capabilities for parallel PCC-based biclustering method applied to microarray data sets

Orzechowski P., Boryczko K.

Bio-Algorithms and Med-Systems

|

2015

|

Vol. 11, no. 4

243--248

EN

Parallel computing architectures are proven to significantly shorten computation time for different clustering algorithms. Nonetheless, some characteristics of the architecture limit the application of graphics processing units (GPUs) for biclustering task, whose function is to find focal similarities within the data. This might be one of the reasons why there have not been many biclustering algorithms proposed so far. In this article, we verify if there is any potential for application of complex biclustering calculations (CPU+GPU). We introduce minimax with Pearson correlation – a complex biclustering method. The algorithm utilizes Pearson’s correlation to determine similarity between rows of input matrix. We present two implementations of the algorithm, sequential and parallel, which are dedicated for heterogeneous environments. We verify the weak scaling efficiency to assess if a heterogeneous architecture may successfully shorten heavy biclustering computation time.

5

Akceleracja obliczeń kryptograficznych z wykorzystaniem procesorów GPU

Bęza P., Gocławski J., Mral P., Sapiecha P.

Studia Bezpieczeństwa Narodowego

|

2014

|

R. 4, Nr 6

341--357

PL

Problem spełnialności formuł rachunku zdań SAT jest jednym z fundamentalnych oraz otwartych zadań we współczesnej informatyce. Jest on problemem NP-zupełnym. To znaczy, że wszystkie problemy z klasy NP możemy sprowadzić do problemu SAT w czasie wielomianowym. Co ciekawe, wśród problemów z klasy NP istnieje wiele takich, które są ściśle związanych z kryptologią, na przykład: faktoryzacja liczb – ważna dla RSA, łamanie kluczy szyfrów symetrycznych, znajdowanie kolizji funkcji skrótu i wiele innych. Odkrycie wielomianowego algorytmu dla SAT skutkowałoby rozwiązaniem problemu milenijnego: P vs. NP. Cel ten wydaje się bardzo trudny do osiągnięcia – nie wiadomo nawet czy jest możliwy. Mając nieco mniejsze aspiracje możemy projektować algorytmy heurystyczne lub losowe dla SAT. W związku z tym, głównym celem autorów pracy jest przedstawienie projektu równoległego SAT Solvera bazującego na algorytmie WalkSAT, w tym procesu jego implementacji z wykorzystaniem środowiska programistycznego OpenCL oraz komputera wyposażonego w karty graﬁczne NVIDIA Tesla. Wraz z dynamicznym rozwojem technologii procesorów typu GPU oraz układów FPGA, jak również przenośnością rozwiązań stworzonych w Open CL, kierunek takich prac staje się interesujący ze względu na uzyskiwaną efektywność obliczeniową, jak również szybkość prototypowania rozwiązań.

EN

The Boolean satisﬁability problem SAT is one of the fundamental and open tasks in present-day information science. This problem is NP-complete. It means that all NP problems can be reduced to SAT in polynomial time. Interestingly, among NP problems, there are many closely related to cryptology, for example: factorization of numbers – important for RSA, breaking keys of symmetric ciphers, ﬁnding collisions of hash functions and many others. The discovery of the polynomial algorithm for SAT would result in resolving one of Millennium Prize Problems: P vs. NP. This objective seems to be hard to achieve – it’s unknown if it is even possible. With slightly lower aspirations, we can design heuristic or random algorithms for SAT. Therefore, the main goal of our study is to present a project of parallel SAT Solver based on WalkSAT algorithm, including its implementation using the OpenCL programming environment and a computer equipped with NVIDIA Tesla graphics cards. With the rapid development of GPU technology and FPGAs, as well as portability of solutions created in OpenCL, the direction of such works becomes interesting because of computational eﬃciency gained, as well as solution prototyping rate.

6

A Bi-objective Optimization Framework for Heterogeneous CPU/GPU Query Plans

Przymus P., Kaczmarski K., Stencel K.

Fundamenta Informaticae

|

2014

|

Vol. 135, nr 4

483--501

EN

Graphics Processing Units (GPU) have significantly more applications than just rendering images. They are also used in general-purpose computing to solve problems that can benefit from massive parallel processing. However, there are tasks that either hardly suit GPU or fit GPU only partially. The latter class is the focus of this paper. We elaborate on hybrid CPU/GPU computation and build optimization methods that seek the equilibrium between these two computation platforms. The method is based on heuristic search for bi-objective Pareto optimal execution plans in presence of multiple concurrent queries. The underlying model mimics the commodity market where devices are producers and queries are consumers. The value of resources of computing devices is controlled by supply-and-demand laws. Our model of the optimization criteria allows finding solutions of problems not yet addressed in heterogeneous query processing. Furthermore, it also offers lower time complexity and higher accuracy than other methods.

7

Graphics processing units in acceleration of bandwidth selection for kernel density estimation

Andrzejewski W., Gramacki A., Gramacki J.

International Journal of Applied Mathematics and Computer Science

|

2013

|

Vol. 23, no. 4

869--885

EN

The Probability Density Function (PDF) is a key concept in statistics. Constructing the most adequate PDF from the observed data is still an important and interesting scientific problem, especially for large datasets. PDFs are often estimated using nonparametric data-driven methods. One of the most popular nonparametric method is the Kernel Density Estimator (KDE). However, a very serious drawback of using KDEs is the large number of calculations required to compute them, especially to find the optimal bandwidth parameter. In this paper we investigate the possibility of utilizing Graphics Processing Units (GPUs) to accelerate the finding of the bandwidth. The contribution of this paper is threefold: (a) we propose algorithmic optimization to one of bandwidth finding algorithms, (b) we propose efficient GPU versions of three bandwidth finding algorithms and (c) we experimentally compare three of our GPU implementations with the ones which utilize only CPUs. Our experiments show orders of magnitude improvements over CPU implementations of classical algorithms.

8

Możliwość zastosowania obliczeń równoległych w elektroenergetyce

Drechny M.

Rynek Energii

|

2012

|

Nr 4

66--70

PL

Zagadnienia takie jak np. analiza wpływu przyłączenia nowych jednostek wytwórczych do systemu elektroenergetycznego, analiza pracy maszyn elektrycznych, złożone algorytmy automatyki zabezpieczeniowej czy prognozowanie produkcji energii elektrycznej za pomocą sieci neuronowych, do realizacji mogą wymagać zastosowania znacznych zasobów obliczeniowych. Zasoby te umożliwiają uzyskanie wyników symulacji (obliczeń) w możliwie krótkim czasie, przy czym nie zawsze jest to czas rzeczywisty. Zwiększenie szybkości wykonywanych obliczeń realizowane jest najczęściej przez zastosowanie odpowiednio zmodyfikowanych algorytmów obliczeniowych wykonywanych na klasycznych procesorach lub procesorach sygnałowych oraz przez zastosowanie obliczeń równoległych. Skupiając się na obliczeniach równoległych zauważyć można od kilku lat tendencję zastosowania procesorów graficznych, których budowa zoptymalizowana jest w kierunku wykonywania wielu takich samych operacji na różnych danych wejściowych (architektura SIMT - ang. Single Instruction Multiple Thread). W pracy przedstawiono przykłady użycia w elektroenergetyce obliczeń równoległych wykorzystujących do tego celu procesory graficzne. Scharakteryzowano możliwości zastosowania do obliczeń tego typu procesorów oraz przedstawiono prosty przykład porównujący szybkość obliczeń z użyciem klasycznego procesora i procesora graficznego.

EN

Issues such as impact of the analysis of the connection of new generation units to power system operation analysis of electrical machines, complex algorithms protective automation and power production forecasting using artificial neural networks to be implemented may require significant computational resources. These resources allow you to obtain the simulation results (calculations) in the shortest possible time, but it is not always the real time. In-creasing the speed of the calculation is carried out mostly by the use of suitably modified algorithms performed on classical computational processors and signal processors, and by the use of parallel computing. Focusing on the paral-lel computing, for the last several years can be seen the tendency to use graphics processors, which structure is opti-mized towards the performance of many of the same operations on different input data (SIMT architecture - Single Instruction Multiple Thread). In the paper are presented examples of application in the power engineering parallel computing which make use for this purpose graphics processors, characterized the applicability of the calculations of this type of processors and showed a simple example the speed of calculations by the use of classical CPU and GPU.

9

Cryptanalysis of the Full AES Using GPU-Like Special-Purpose Hardware

Biryukov A., Großschädl J.

Fundamenta Informaticae

|

2012

|

Vol. 114, nr 3/4

221-237

EN

The block cipher Rijndael has undergone more than ten years of extensive cryptanalysis since its submission as a candidate for the Advanced Encryption Standard (AES) in April 1998. To date, most of the publicly-known cryptanalytic results are based on reduced-round variants of the AES (respectively Rijndael) algorithm. Among the few exceptions that target the full AES are the Related-Key Cryptanalysis (RKC) introduced at ASIACRYPT 2009 and attacks exploiting Time- Memory-Key (TMK) trade-offs such as demonstrated at SAC 2005. However, all these attacks are generally considered infeasible in practice due to their high complexity (i.e. 2^99.5 AES operations for RKC, 2^80 for TMK). In this paper, we evaluate the cost of cryptanalytic attacks on the full AES when using special-purpose hardware in the form of multi-core AES processors that are designed in a similar way as modern Graphics Processing Units (GPUs) such as the NVIDIA GT200b. Using today's VLSI technology would allow for the implementation of a GPU-like processor reaching a throughput of up to 10^12 AES operations per second. An organization able to spend one trillion US$ for designing and building a supercomputer based on such processors could theoretically break the full AES in a time frame of as little as one year when using RKC, or in merely one month when performing a TMK attack. We also analyze different time-cost trade-offs and assess the implications of progress in VLSI technology under the assumption that Moore’s law will continue to hold for the next ten years. These assessments raise some concerns about the long-term security of the AES.

10

Wspomaganie sprzętowe do wyznaczenia statystyk obrazów naturalnych wyższego rzędu

Tomaszewska A.

Pomiary Automatyka Kontrola

|

2011

|

R. 57, nr 8

899-901

PL

Statystyki obrazów naturalnych, definiowanych jako nieprzetworzone obrazy rejestrowane przez człowieka, charakteryzują się dużą regularnością. Ich cechy wykorzystywane są w wielu aplikacjach grafiki komputerowej takich jak usuwanie szumu, czy kompresja. W artykule przedstawiono algorytm do szybkiego obliczenia statystyk wyższego rzędu na podstawie współczynników falek z wykorzystaniem programowalnego procsora graficznego. W rezultatach przedstawiono wyniki przyspieszenia uzyskanego przy wykorzystaniu GPU w porównaniu z implementacją na CPU.

EN

A natural image is unprocessed reproduction of a natural scene observed by a human. The Human Visual System (HVS), during its evolution, has been adjusted to the information encoded in natural images. Computer images are interpreted best by a human when they fit natural image statistics that can model the information in natural images. The main requirement of such statistics is their striking regularity. It hepls separate the information from noise, reconstruct information which is not avaiable in an image, or only partially avaiable. Other applications of statistics is compression, texture synthesis or finding distortion model in image like blur kernel. The statistics are translation and scale invariant, therefore a distribution of statistics does not depend on the object position in the image and on its size. In this paper there are presented higher order natural image statistics calculations based on GPU. The characteristic of the statistics is that they are independent of the scale and rotation transformations. Therefore, they are suitable for many graphic applications. To analyze images there is used statistics computed in the wavelet domain and there is considered the image contrast. The computation speedup is presented in the results. The paper is organized as follows: the overview of natural images sta-tistics is introduced in Section 2. In Section 3 the GPU-based implementation is described. The obtained results are given in Section 4. Finally, there are presented the concluding remarks.

11

Efekt rozpraszania podpowierzchniowego z wykorzystaniem programowalnego procesora graficznego

Tomaszewska A., Stefanowski K.

Pomiary Automatyka Kontrola

|

2011

|

R. 57, nr 8

930-932

PL

W artykule zaprezentowano sposób obliczania w czasie rzeczywistym efektu podpowierzchniowego rozpraszania światła w obiektach częściowo przeźroczystych przy zwróceniu szczególnej uwagi na wydajność obliczeniową algorytmu. Algorytm zaprojektowanego pod kątem implementacji sprzętowej realizowanej na programowalnym procesorze graficznym. Dane przekazywane są do GPU w postaci zmiennych (uniform i attribute), gdzie wykorzystywane są do dalszych obliczeń. Porównanie wydajności prezentowanego podejścia z innymi algorytmami przedstawiono w podsumowaniu artykułu.

EN

In the paper there is presented the spherical harmonics (SH) based method for subsurface scattering and its GPU-based implementation. The described approach is modification of the Green's algorithm [1]. The 3D model thickness was encoded for each vertex in every possible direction. The algorithm is divided into two parts: the preprocessing executed on CPU and the visualization stage designed for GPU. The tests were carried out and described. They revealed the effectiveness of the obtained results. To verify the results, they were compared with those obtained from other algorithms. The results show efficiency benefits of the authors' algorithm in comparison with the comparable quality approaches. Moreover, the modification of the Green`s algorithm improves the quality of the subsurface scattering effect, as the unnatural effect of sharp curves visible on the final images is reduced. It is possible because in this approach the way the light goes through an object depends on the model thickness. The paper is organized as follows. In Section 2 the previous works are discussed. In Section 3 the application of subsurface scattering based on the spherical harmonics and its hardware implementation are presented. Section 5 shows the obtained results. At the end of the paper there are given some concluding remarks.

12

Efficient Implementation of Stereovision Algorithms for Graphics Processing Unit in DirectShow Technology

Szajerman D., Pelczynski P.

Machine Graphics and Vision

|

2011

|

Vol. 20, No. 2

139-156

EN

Stereovision is a passive technique for estimation of depth in 3D scenes. Unfortunately, depth estimation in this imaging technique is computationally demanding. We show that stereovision matching algorithms can be efficiently mapped onto the present-day graphics processing units (GPUs). A number of modifications to the original image disparity estimation algorithm have been proposed that make running its computation on GPU platforms particularly efficient. A complete depth estimation system was implemented in GPU, covering correction of camera distortions, image rectification and disparity estimation. To obtain modularity of developed software, the DirectShow multimedia technology was used. Examples, computed depth maps are shown, and time performances of the proposed algorithms are outlined. The developed system has proved the usefulness of both GPU implementation and the DirectShow technology in scene depth estimation.

13

Wykorzystanie procesorów graficznych do szybkiego renderingu krajobrazu sferycznego

Tomaszewska A., Osobniak O.

Pomiary Automatyka Kontrola

|

2010

|

R. 56, nr 7

790-792

PL

W artykule zaprezentowano sposób generowania w czasie rzeczywistym planety o dużej powierzchni oraz wysokim poziomie szczegółowości. Algorytm opracowano na podstawie techniki wykorzystującej mapy obcięcia geometrii, umożliwiając generowanie na bieżąco dowolnego wycinka terenu na podstawie parametrów ustawienia kamery. Algorytm zaprojektowano pod kątem implementacji sprzętowej z wykorzystaniem programowalnego procesora graficznego oraz technologii CUDA.

EN

In the paper there is presented a fast method for large and detailed spherical terrain rendering. Rendering terrain with a high degree of realism is an ongoing need in real-time computer graphics applications. To render scenes of increased sizes and complexity, several terrain rendering algorithms have been proposed in the literature. One of the recent techniques called geometry clipmaps relies on the position of the viewpoint to create multi-resolution representation of the terrain, using nested meshes. In [1] there is proposed very efficient, GPU based approach of this technique for large terrain models. In the paper there are presented techniques which combine procedural approach and geometry clipmaps together. It enables rendering an arbitrary piece of terrain on fly based on the camera parameters. To improve the algorithm efficience most computations were performed on GPU with use of vertex and pixel shaders and CUDA technology. The paper is organized as follows: Section 2 discusses the previous works, Section 3 presents the application of procedural terrein generetion based on the clipmaps and its hardware implementation, whereas the results obtained are given in Section 4. Thge conclusions are presented at the end of the paper.

14

Effective real-time computer graphics processing based on depth-of-field effect

Tomaszewska A., Bazyluk B.

Pomiary Automatyka Kontrola

|

2010

|

R. 56, nr 7

675-677

EN

In this paper we present a GPU-based effect of an artificial depth of field, which varies with the distance from camera of the point that the user is looking at. Depth of field greatly enhances the scene's realism. The goal of our technique is the 3D approach with the user interaction that relies on the simulation of gaze point. Most of the computations are efficiently performed on the GPU with the use of vertex and pixel shaders.

PL

W artykule zaprezentowano sprzętową implementację efektu głębi widzenia. Opracowane podejście zoptymalizowano wykorzystując informację o punkcie skupienia wzroku użytkownika. W artykule główną uwagę skoncentrowano na efekcie zachowania ostrości wyłącznie dla tego fragmentu sceny, na którym w danym momencie skupiony jest wzrok obserwatora. Działanie algorytmu zaprojektowano pod kątem implementacji sprzętowej z wykorzystaniem programowalnych jednostek cieniowania wierzchołków oraz fragmentów. Do synchronizacji shaderów (programów uruchamanych na karcie) oraz transferu danych pomiędzy pamięcią główną a GPU wykorzystano procesor CPU, a wszystkie dane przechowano w postaci 32-bitowych tekstur. W implementacji, moduły algorytmu wykonujące operacje macierzowe korzystały z obiektu bufora ramki umożliwiającego generowanie wyniku do tekstury zamiast do standardowego bufora okna. W celu prezentacji efektu głębi widzenia stworzono aplikację umożliwiającą przetestowanie wydajności algorytmu wykorzystującego informację o punkcie skupienia wzroku uzyskując wzrost wydajności nawet do 40% w porównaniu z podejściem bez optymalizacji [2]. W rozdziale 2 artykułu zaprezentowano przegląd istniejących algorytmów symulujących efekt głębi widzenia. Prezentowane podejście oraz jego implementację sprzętową przedstawiono w rozdziale 3. Rezultaty działania metody zaprezentowano w rozdziale 4 a podsumowanie w rozdziale 5.

15

Metoda usuwania duchów na przykładzie akwizycji obrazów HDR z wykorzystaniem wspomagania sprzętowego GPU

Tomaszewska A., Markowski M.

Pomiary Automatyka Kontrola

|

2009

|

R. 55, nr 8

678-680

PL

W artykule zaprezentowano szybką i w pełni automatyczną technikę wykrywania i usuwania duchów, powstających w wyniku składania sekwencji zdjęć danej sceny. Prezentowane podejście umożliwia rejestrację sceny bez konieczności wykorzystania specjalistycznego sprzętu. Działanie algorytmu zaprojektowanego pod kątem implementacji sprzętowej z wykorzystaniem procesora GPU oraz zaprezentowano na przykładzie algorytmu akwizycji obrazów o szerokim zakresie dynamiki, weryfikując jego poprawność za pomocą algorytmu HDR VDP.

EN

In the paper we present the method for fast and full automatic approach for ghosts removal on programmable graphics hardware. The technique is based on probability maps that are calculated with comparison function from sequences of hand-held photographs. In practice, several basic problems occur when taking an image sequence. First, the camera is moving which causes images to misalign. This results in a blurry image. Secondly, objects are in movement causing ghost artifacts. In the paper we present a technique for acquisition of non-static scenes. The algorithm we implement as a part of system for acquisition of hand-held high dynamic range (HDR) images. Our application of this technique allows to create correct HDR image based on a simple sequence of the LDR (Low Dynamic Range) photographs with overlapped ghost regions. Additionally, the application aligns photographs and provides image de-noising. Most of computations are efficiently performed on GPU with the use of vertex and pixel shaders. We compare the performance of GPU-based implementation with standard approach and validated our results via HDR VDP (ang. High Dynamic Range Visual Difference Predicator) algorithm. The paper is organized as follows. In section 2 previous works are discussed. In section 3, the application of our HDR acquisition technique and its hardware implementation are presented. Section 4 shows achieved results. Finally we have concluded the paper.

16

GPU-based multi-layer perceptron as efficient method for approximation complex light models in per-vertex lighting

Pietras K., Rudnicki M.

Studia Informatica : systems and information technology

|

2005

|

Vol. 2(6)

53--63

EN

This paper describes a display method of the sky color on GeForce FX hardware. Lighting model used here is taken from “Display of the Earth taking into account atmospheric scattering” by Tomoyuki Nishita et.al., however this model is not the only suitable one in the proposed method.