Wyniki wyszukiwania - BazTech

1

Accelerating computation of a reduced order model of a structural system resulting from Craig–Bampton reduction using GPU programming

Górecki Piotr, Kalinowski Miłosz, Jeziorek Łukasz, Broniszewski Jakub, Koziara Tomasz

Computer Assisted Methods in Engineering and Science

|

2024

|

Vol. 31, no. 1

51--66

EN

The Craig–Bampton (CB) method is a well-known substructuring technique that reduces the size of a finite element model (FEM) using a set of vibration modes. For large FEA models, the reduction process could be computationally expensive since it requires algebra operations on FEM mode shapes and FEM system sparse matrices. In this paper, we investigate the potential of usage of GPU parallel processing to speed up solving the system of linear equations that results from the CB reduction process made for a model of cyclic structures. A Python based high-level approach, employing the CuPy, GinkGo and STRUMPACK libraries on the GPU, is compared with an optimized Fortran code. In side-to-side comparisons, employing the same inputs, the Python-GPU code is run on a single GPU device and the Fortran code is run on a multi-core compute node. The CB reduction process was split into several parts, each dealing with different kind of algebraic formulation of the problem. Performance comparisons were focused on the sparse system linear solver, since it turned out to be the most time-consuming part. The results suggest that the current GPU-based linear sparse solvers do not surpass the state-of-the-art CPU-based MKL PARDISO solver (at least up to 1M DOFs).

2

Design of a CPU Heat Sink with Minichannel-Fins & its Thermal Analysis

Arzutuğ Mehmet Emin

Polish Journal of Chemical Technology

|

2023

|

Vol. 25, nr 3

89--100

EN

In this paper, the design and the thermal analysis of a tribled microprocessor cooler combining the advantages of strong swirl flow and minichannel-fins and CuO nanofluid, have been presented. It is thought that the results will contribute to the understanding of the effects of parameters on the cooling flux of the heat sink and the decline at the microprocessor temperature, as Reynolds number in the minichannels and CuO % volume fraction. The results have exhibited that the total performance of the heat sink cooled with the mixture of water–CuO-EG nanofluids increases with the increase of Re number and the %load of nanoparticles in the coolant. It has been determined that the energy withdrawn from the microprocessor was 241 times higher than the energy generated for maximum CuO load and Re number conditions. Besides, the highest temperature decrease has been measured at the maximum CuO load value and maximum Re number.

3

Porting of finite element integration algorithm to Xeon Phi coprocessor-based HPC architectures

Krużel Filip, Banaś Krzysztof, Iacomo Mauro

Computer Assisted Methods in Engineering and Science

|

2023

|

Vol. 30, no. 4

427--459

EN

In the present article, we describe the implementation of the finite element numerical integration algorithm for the Xeon Phi coprocessor. The coprocessor was an extension of the many-core specialized unit for calculations, and its performance was comparable with the corresponding GPUs. Its main advantages were the built-in 512-bit vector registers and the ease of transferring existing codes from traditional x86 architectures. In the article, we move the code developed for a standard CPU to the coprocessor. We compareits performance with our OpenCL implementation of the numerical integration algorithm, previously developed for GPUs. The GPU code is tuned to fit into a coprocessor by ourauto-tuning mechanism. Tests included two types of tasks to solve, using two types of approximation and two types of elements. The obtained timing results allow comparing the performance of highly optimized CPU and GPU codes with a Xeon Phi coprocessor performance. This article answers whether such massively parallel architectures perform better using the CPU or GPU programming method. Furthermore, we have compared the Xeon Phi architecture and the latest available Intel’s i9 13900K CPU when writing this article. This comparison determines if the old Xeon Phi architecture remains competitive in today’s computing landscape. Our findings provide valuable insights for selectingthe most suitable hardware for numerical computations and the appropriate algorithmic design.

4

An Efficient Classification of Hyperspectral Remotely Sensed Data Using Support Vector Machine

Mahendra H. N., Mallikarjunaswamy S.

International Journal of Electronics and Telecommunications

|

2022

|

Vol. 68. No. 3

609--617

EN

This work present an efficient hardware architecture of Support Vector Machine (SVM) for the classification of Hyperspectral remotely sensed data using High Level Synthesis (HLS) method. The high classification time and power consumption in traditional classification of remotely sensed data is the main motivation for this work. Therefore presented work helps to classify the remotely sensed data in real-time and to take immediate action during the natural disaster. An embedded based SVM is designed and implemented on Zynq SoC for classification of hyperspectral images. The data set of remotely sensed data are tested on different platforms and the performance is compared with existing works. Novelty in our proposed work is extend the HLS based FPGA implantation to the onboard classification system in remote sensing. The experimental results for selected data set from different class shows that our architecture on Zynq 7000 implementation generates a delay of 11.26 μs and power consumption of 1.7 Watts, which is extremely better as compared to other Field Programmable Gate Array (FPGA) implementation using Hardware description Language (HDL) and Central Processing Unit (CPU) implementation.

5

Wykorzystanie GPGPU do obliczeń ekspozycji ludności na narażenie pola elektrycznego

Wroński Jacek W., Rzeźniczak Krzysztof, Michalski Igor

Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne

|

2019

|

nr 6

291--294, CD

PL

W niniejszym artykule przedstawiono metodę wykorzystania procesorów graficznych do obliczeń wartości poziomów niejonizujących pól elektromagnetycznych, pochodzących od systemów radiokomunikacyjnych, stanowiących potencjalne źródło narażeń ludności na pole elektromagnetyczne. Czasy obliczeń porównano z metodami wykorzystującymi przetwarzanie równoległe na procesorach CPU.

EN

This article presents the method of using GPGPU to estimate EMF levels of human exposure on non-ionized EMF, deriving from wireless systems. Calculation time on GPGPU has been compared to time elapsed with parallel calculations performed on CPU.

6

Wykorzystanie CPU i GPU do obliczeń w Matlabie

Woźniak Jarosław

Journal of Computer Sciences Institute

|

2019

|

Vol. 10

32--35

PL

W artykule zostały przedstawione wybrane rozwiązania wykorzystujące procesory CPU oraz procesory graficzne GPU do obliczeń w środowisku Matlab. Porównywano różne metody wykonywania obliczeń na CPU, jak i na GPU. Zostały wskazane różnice, wady, zalety oraz skutki stosowania wybranych sposobów obliczeń.

EN

The article presents selected solutions using CPU processors and GPUs for calculations in the Matlab environment. Various methods of performing calculations on the CPU as well as on the GPU were compared. Differences, disadvantages, advantages and effects of using selected calculation methods have been indicated.

7

Implementacja metody momentów w heterogenicznym środowisku obliczeniowym CPU/GPU

Karwowski A., Topa T., Noga A.

Przegląd Telekomunikacyjny + Wiadomości Telekomunikacyjne

|

2016

|

nr 6

503--505, CD

PL

Opisano implementację metody momentów – sztandarowego narzędzia analizy zagadnień inżynierii pola elektromagnetycznego (anteny, kompatybilność EM, mikrofale) – w heterogenicznym środowisku obliczeniowym CPU/GPU niskobudżetowej stacji roboczej typu desktop. Wykazano możliwość znaczącej poprawy wydajności metody dzięki wykorzystaniu zdolności procesora wielordzeniowego i procesorów strumieniowych karty graficznej do przetwarzania równoległego.

EN

Implementation of the Method-of-Moments – as a tool for the analysis of various electromagnetic engineering problems (antennas, electromagnetic compatibility, microwaves) – on a heterogeneous CPU/GPU platform of a typical low-cost desktop workstation is described in the paper. The possibility of attaining noticeable performance improvement of the method by utilizing potential of both the multi-core CPU processor and graphic card for parallel processing is demonstrated.

8

Effectiveness of Fast Fourier Transform implementations on GPU and CPU

Puchała D., Stokfiszewski K., Szczepaniak B., Yatsymirskyy M.

Przegląd Elektrotechniczny

|

2016

|

R. 92, nr 7

69--71

EN

In this paper, we present the results of comparison of the effectiveness of selected variants of radix-2 Fast Fourier Transform (FFT) algorithms implemented on both Graphics (GPU) and Central (CPU) Processing Units. The considered algorithms differ in memory consumption and the arrangement of data-flow paths which affects the global memory coalescing and cache memory exploitation. The obtained results allow to indicate the variants of FFT algorithms which are best suited for GPU and CPU architectures, to confirm the advisability of GPU oriented calculations of FFT and to formulate a guideline for implementations of fast algorithms of various linear transforms.

XX

W niniejszej pracy przedstawiono wyniki porównania efektywności wybranych wariantów algorytmów szybkiej transformaty Fouriera (FFT) typu radix-2 realizowanych zarówno dla procesorów graficznych (GPU) jak i typowych jednostek centralnych (CPU). Rozważane algorytmy różnią się zapotrzebowaniem pamięciowym oraz postaciami grafów przepływu danych, które mają wpływ na spójność wykorzystania pamięci globalnej oraz pamięci cache jednostek GPU i CPU. Uzyskane wyniki pozwalają na wskazanie wariantów algorytmów FFT, które są najlepiej dostosowane dla architektur GPU i CPU, pozwalają też potwierdzić celowość realizacji implementacji FFT zorientowanych na wykorzystanie jednostek GPU, a także sformułować ogólne wytyczne dla implementacji zorientowanych na wykorzystanie jednostek GPU algorytmów szybkich przekształceń liniowych.

9

Visualizing CPU Microarchitecture

Wojtowicz T.

Schedae Informaticae

|

2015

|

Vol. 24

197--210

EN

Deep understanding of microprocessor architecture, its internal structure and mechanics of its work is essential for engineers in the fields like computer science, integrated circuit design or embedded systems (including microcontrollers). Usually the CPU architecture is presented at the level of ISA, functional decomposition of the chip and data flows. In this paper we propose more tangible, interactive and effective approach to present the CPU microarchitecture. Based on the recent advancements in simulation of MOS6502, one of the most successful microprocessor of all times, that started the personal computing revolution, we present the CPU visualisation framework. The framework supports showing CPU internals at various levels (from single transistor, through logic gates, ending with registers, operation decoders and ALU). It allows for execution of real code and detailed analysis of fetch–decode–execute cycle, measurement of cycles per operation or measurement of the CPU activity factor. The analysis means provided by this framework will also enable us to propose the transistor level simulation speed improvements to the model in the future.

10

Java Based Transistor Level CPU Simulation Speedup Techniques

Wojtowicz T.

Schedae Informaticae

|

2015

|

Vol. 24

179--195

EN

Transistor level simulation of the CPU, while very accurate, brings also the performance challenge. MOS6502 CPU simulation algorithm is analysed with several optimisation techniques proposed. Application of these techniques improved the transistor level simulation speed by a factor of 3–4, bringing it to the levels on par with fastest RTL-level simulations so far.

11

Security modules and CPU in intelligent passenger information system

Hejczyk T., Wszołek B., Gałuszka A., Kamiński G., Surma D.

Archives of Transport System Telematics

|

2015

|

Vol. 8, iss. 2

22--26

EN

This article presents selected components of the prototype of the Integrated System of Supporting Information Management in Passenger Traffic (the polish acronym of the system is ZSIKRP Demonstrator+). The system is equipped with significantly expanded range of offered functionality, which corresponds to the current demands of the market. Additionally, it has features distinguishing it from other products available on the market. Prototypes of the system are built in two versions: for electric (EMU - type) and diesel (PCS - type) vehicles. They will be installed in demonstration scale in real conditions in Mazovia Railways and Regional Transport. ZSIKRP system also focuses on ensuring the safety of travelers in both types of vehicles. It is done by installation of the fire alarm module in vehicles. Thanks to this it will be possible to transfer information about possible emergency situation to Supervision Center. The system also improves passenger comfort by wireless modules for Internet and Intranet access using “leaky cable” technology.

12

Digital processing methods of images and signals in electromagnetic infiltration process

Kubiak I.

Image Processing & Communications

|

2013

|

Vol. 18, no. 1

5--14

EN

The article contains information about the capabilities of electromagnetic infiltration process in case of occurrence of strong interfering signals. As a methods supporting infiltration process used method of digital processing of signals and images in the form of histogram transformations, global and local thresholding of signal amplitudes and logical filters. The material presented in the article shows that risk can arise if the uncontrolled use of the computer. Risks that could decide our safety and security of our data. obtained images. Manipulation of histograms, threshold amplitudes of the emission signal correlated with the classified signals or logical filters highlight the weakness of the security used at the source. The presence of strong interfering signals such as vertical and horizontal synchronization signals blocking measurement receivers, do not prevent the reproduction of classified information. Opportunities of the electromagnetic infiltration in situations of weak compromising emissions occurs are presented in the article.

13

Stereoscopic video chroma key processing using NVIDIA CUDA

Sagan J.

Annales Universitatis Mariae Curie-Skłodowska. Sectio AI, Informatica

|

2013

|

Vol. 13, no. 1

81--87

EN

In this paper, I use the NVIDIA CUDA technology to perform the chroma key algorithm on stereoscopic images. NVIDIA CUDA allows to process parallel algorithms on GPU. Input data are stereoscopic images with the monochromatic background and the destination background image. Output data is the combination of inputs by using the chroma key. I compare the algorithm efficiency between the GPU and CPU execution.

14

Application of GPU in the development of 3D hydrodynamics simulators for oil recovery prediction

Beisembetov I. K., Bekibaev T. T., Assilbekov B. K., Zhapbasbayev U. K., Kenzhaliev B. K.

AGH Drilling, Oil, Gas

|

2012

|

Vol. 29, no. 1

75-88

EN

In this article computer's graphics card application in prediction of oil recovery using the CUDA architecture is studied. CUDA is architecture of parallel computing made by NVIDIA Company. It allows increasing dramatically the calculating performance due to GPU (graphical processors) usage. Calculations were executed on field models with 3 million grid blocks. Material balance equation approximated with IMPES method. As the result of numerical modeling of oil recovery prediction with GPU, dozens of times acceleration of calculations comparing with CPU has been taken.

PL

Artykuł przedstawia badania nad programem graficznym wykorzystywanym w planowaniu wtórnego wydobycia ropy naftowej z wykorzystaniem równoległego systemu obliczeniowego CUDA. CUDA jest systemem stworzonym przez firmę NVIDIA. Pozwala on na ogromne zwiększenie mocy obliczeniowej poprzez zastosowanie procesorów graficznych GPU. Porównane zostały wyniki osiągnięte od roku 2003 obliczone z wykorzystaniem zwykłego procesora CPU oraz procesora graficznego GPU. Obliczenia zostały wykonane na modelu złożowym wykonanym na siatce przestrzennej złożonej z 3 milionów komórek. Równanie bilansu masowego w przybliżeniu opisuje metoda przepływu dwufazowego w ośrodku porowatym typu IMPES. W rezultacie modelowania numerycznego wtórnego wydobycia ropy naftowej z wykorzystaniem procesora graficznego GPU, wyniki obliczeń uzyskano wielokrotnie szybciej niż w przypadku stosowania procesora typu CPU.

15

Akceleracja obliczeń komputerowych za pomocą układów graficznych z wykorzystaniem technologii CUDA

Stefanowicz Ł., Wiśniewski R., Wiśniewska M.

Pomiary Automatyka Kontrola

|

2011

|

R. 57, nr 8

954-956

PL

W artykule zaprezentowano możliwość zastosowania układów graficznych celem przyspieszenia obliczeń komputerowych. Przedstawiono technologię oraz architekturę CUDA firmy nVidia, a także podstawowe rozszerzenia względem standardów języka C. W referacie omówiono autorskie algorytmy testowe oraz metodykę badań, które przeprowadzono w celu określenia skuteczności akceleracji obliczeń komputerowych z wykorzystaniem procesorów graficznych GPU w porównaniu do rozwiązań tradycyjnych, opartych o CPU.

EN

The paper deals with application of the graphic processor units (GPUs) to acceleration of computer operations and computations. The traditional computation methods are based on the Central Processor Unit (CPU), which ought to handle all computer operations and tasks. Such a solution is especially not effective in case of distributed systems where some sub-tasks can be performed in parallel. Many parallel threads can accelerate computing, which results in a shorter execution time. In the paper a new CUDA technology and architecture is shown. The presented idea of CUDA technology bases on application of the GPU processors to compu-tation to achieve better performance in comparison with the traditional methods, where CPUs are used. The GPU processors may perform multi-thread calculation. Therefore, especially in case of tasks where concurrency can be applied, CUDA may highly speed-up the computation process. The effectiveness of CUDA technology was verified experimentally. To perform investigations and experiments, the own test modules were used. The library of benchmarks consists of various algorithms, from simple iteration scripts to video processing methods. The results obtained from calculations performed via CPU and via GPU are compared and discussed.

16

Możliwości i perspektywy współczesnej grafiki komputerowej

Szuba T.

Roczniki Geomatyki

|

2009

|

T. 7, z. 6

97-103

EN

The paper deals with the problem: what is modern computer graphics now and what is its potential. If we think in terms of .the centre of gravity., modern computer graphics is moving from the art towards capturing the essence of an object or a being to be modeled. In other words, key problems for the computer graphics are physical phenomena (e.g. liquids), mechanical properties (e.g. textile, hairs) or even mental properties of virtual beings. Therefore, modern computer graphics requires extremely high computational abilities. Advanced computer games demonstrate this very well. Having all this in mind, many researchers think that modern computer graphics is the main leading force in the development of modern computer science. On the basis of above remarks, the paper tries to resume the application areas of modern computer graphics now and in the near future.

17

Usprawnienie wymiany informacji pomiędzy procesorami bitowo-bajtowej jednostki centralnej sterownika programowalnego

Chmiel M., Hrynkiewicz E.

Szybkobieżne Pojazdy Gąsienicowe

|

1999

|

nr 12

113--124

PL

W artykule przedstawiono kilka rozwiązań sprzętowych bitowo-bajtowej jednostki centralnej sterownika programowalnego, które zorientowane są na maksymalne zoptymalizowanie wymiany informacji pomiędzy procesorami współtworzącymi daną jednostkę. Optymalizacja ma na celu maksymalne wykorzystanie możliwości, jakie daje dwuprocesorowa struktura jednostki centralnej – chodzi przede wszystkim o dużą szybkość wykonywania instrukcji przez procesor bitowy oraz dużą funkcjonalność procesora bajtowego. Struktura taka powinna powodować, że procesory jak najczęściej pracują równocześnie (równolegle) oraz to, że rzadko występują sytuacje powodujące konieczność oczekiwania jednego procesora na drugi.

EN

The paper presents some hardware solutions for the bit-byte CPU of a PLC, which are oriented for maximum optimisation of data exchange between the CPU processors. The optimisation intends to utilise to maximum the possibilities given by the two-processor architecture of the CPUs. The key point is preserving high speed of instruction processing by the bit processor, and high functionality of the byte processor. The optimal structure should enable the processors to work in parallel for as long as possible, and minimise the situations, when one processor has to wait for the other.