Wyniki wyszukiwania - BazTech

1

Porting of finite element integration algorithm to Xeon Phi coprocessor-based HPC architectures

Krużel Filip, Banaś Krzysztof, Iacomo Mauro

Computer Assisted Methods in Engineering and Science

|

2023

|

Vol. 30, no. 4

427--459

EN

In the present article, we describe the implementation of the finite element numerical integration algorithm for the Xeon Phi coprocessor. The coprocessor was an extension of the many-core specialized unit for calculations, and its performance was comparable with the corresponding GPUs. Its main advantages were the built-in 512-bit vector registers and the ease of transferring existing codes from traditional x86 architectures. In the article, we move the code developed for a standard CPU to the coprocessor. We compareits performance with our OpenCL implementation of the numerical integration algorithm, previously developed for GPUs. The GPU code is tuned to fit into a coprocessor by ourauto-tuning mechanism. Tests included two types of tasks to solve, using two types of approximation and two types of elements. The obtained timing results allow comparing the performance of highly optimized CPU and GPU codes with a Xeon Phi coprocessor performance. This article answers whether such massively parallel architectures perform better using the CPU or GPU programming method. Furthermore, we have compared the Xeon Phi architecture and the latest available Intel’s i9 13900K CPU when writing this article. This comparison determines if the old Xeon Phi architecture remains competitive in today’s computing landscape. Our findings provide valuable insights for selectingthe most suitable hardware for numerical computations and the appropriate algorithmic design.

2

Analysis of the Use of Undervolting to Reduce Electricity Consumption and Environmental Impact of Computers

Muc Adam, Muchowski Tomasz, Kluczyk Marcin, Szeleziński Adam

Rocznik Ochrona Środowiska

|

2020

|

Tom 22, cz. 2

791--808

EN

This paper presents a method of lowering the processor’s voltage and temperature in which the computer operates by performing an operation called undervolting. By using undervolting it is possible to reduce electricity consumption and the amount of heat generated by computer workstations by up to 30%. This problem is particularly relevant for institutions that use a large number of computers. The more the computers are subjected to the higher computational load, the more effective the mechanism of undervolting is. Undervolting the processor does not reduce its performance, but lowers its operating temperature, has a positive impact on its life span and power consumption. Maintaining a low temperature of operation for computer hardware is essential to reduce operating and repair costs. The paper also presents the results of environmental research aimed at assessing the validity and effectiveness of undervolting.

PL

W pracy przedstawiono metodę obniżania napięcia procesora i temperatury pracy komputera poprzez wykonanie operacji zwanej undervoltingiem. Przez zastosowanie undervoltingu można obniżyć nawet o 30% zużycie energii elektrycznej i ilość wydzielanego ciepła przez stanowiska komputerowe. Problem ten jest szczególnie istotny w przypadku instytucji, które korzystają z dużej liczby komputerów. Skuteczność mechanizmu jest tym większa im komputery poddane undervoltingowi są bardziej obciążone obliczeniowo. Wykorzystywanie undervoltingu w konfiguracji procesora nie zmniejsza jego wydajności, a obniża jego temperaturę pracy, wpływa pozytywnie na jego żywotność i zużycie energii elektrycznej. Utrzymanie dobrej kultury pracy sprzętu komputerowego jest kluczowe, by obniżyć koszty eksploatacji oraz napraw. W pracy przedstawiono również wyniki badań środowiskowych, których celem była ocena zasadność i efektywności stosowania undervoltingu.

3

Chłodzenie adiabatyczne – zasady pracy i doboru

Wesołowski A.

Technika Chłodnicza i Klimatyzacyjna

|

2018

|

nr 11-12

422--428

PL

Autor omawia niezwykle ważne zagadnienia dotyczące chłodzenia serwerów w „Data Centers” (Centralach Przetwarzania Danych - CPD). Elementem składowym i najważniejszym każdego serwera jest procesor, który w czasie swojej pracy wydziela duże ilości ciepła. Ciepło to należy w sposób jak najefektywniejszy odprowadzić do otoczenia, aby serwer mógł w miarę efektywnie i sprawnie pracować. Z doświadczeń firm komputerowych wynika, że ze wzrostem temperatury serwera jego szybkość przetwarzania danych jest wolniejsza. W opracowaniu przedstawione zostały obecne i potencjalne tendencje chłodzenia CPD z uwzględnieniem metod bezpośredniego chłodzenia procesorów.

EN

Very important problems of servers’ cooling in Data Centres are discussed. The core part of each server is the processor. It emits lots of heat during operation. This heat should be efficiently removed in order to assure proper working conditions for the processor. The higher is temperature of the processor the lower is its data processing rate. In the paper current and possible methods for equipment cooling in Data Centres are described.

4

A proposed round robin scheduling algorithm for enhancing performance of CPU utilization

Phorncharoen S., Sa-Ngiamvibool W.

Przegląd Elektrotechniczny

|

2018

|

R. 94, nr 4

26--29

EN

An important problem of an operating system is CPU scheduling. This paper proposes round robin (RR) scheduling algorithm, named DevRR, with new dynamic time quantum (TQ) computed by the standard deviation and average burst time of each process in a queue. Performance of DevRR is compared to the standard RR, PRR, and BRR in term of decreasing of an average waiting time (AWT), an average turnaround time (ATT), and number of context switches (NCS). Results can reduce 22.97% of AWT, 22.13% of ATT, and 30.26% of NCS for 50-process data set.

PL

W artykule zaproponowano algorytm procesora z dynamicznym czasem kwantowym określanym jako odchyłka standardowa i średni czas impulsu każdego procesu w kolejkowaniu. Właściwości algorytmu porównano z innymi standardowymi metodami pod kątem oceny czasu oczekiwania.

5

Cooling of a processor with the use of a heat pump

Lipnicki Z., Lechów H., Pantoł K.

Civil and Environmental Engineering Reports

|

2018

|

Vol. 28, no. 1

16--25

EN

In this paper the problem of cooling a component, in the interior of which heat is generated due to its work, was solved analytically. the problem of cooling of a processor with the use of a heat pump was solved based on a earlier theoretical analysis of authors of external surface cooling of the cooled component by using the phenomenon of liquid evaporation. Cases of stationary and non-stationary cooling were solved as well. The authors of the work created a simplified non-stationary analytical model describing the phenomenon, thanks to which heat distribution within the component, contact temperature between the component and liquid layer, and the evaporating substance layer thickness in relation to time, were determined. Numerical calculations were performed and appropriate charts were drawn. The resulting earlier analytical solutions allowed conclusions to be drawn, which might be of help to electronics engineers when designing similar cooling systems. Model calculations for a cooling system using a compressor heat pump as an effective method of cooling were performed.

PL

Przedstawiono analityczne rozwiązanie równania chłodzenia jednostki, w której wytwarzane jest ciepło. Z tego powodu opracowano uproszczony, niestacjonarny model określania rozkładu temperatury w jednostce, temperatury kontaktu między jednostką a warstwą cieczy oraz grubości warstwy parowania w funkcji czasu. Podano teoretyczną analizę zewnętrznego chłodzenia jednostki poprzez uwzględnienie zjawiska parowania cieczy za pomocą równań Fouriera i Poissona. Pokazano zarówno stacjonarny, jak i niestacjonarny opis chłodzenia. Uzyskane wyniki symulacji wydają się przydatne przy projektowaniu podobnych układów chłodzenia. Wykonywany jest również tryb obliczeniowy dla układów chłodzenia wyposażonych w pompę ciepła sprężarki, jako efektywnej metody chłodzenia.

6

Application of ASIP in Embedded Design with Optimized Clock Management

Venkanna M., Rao R., Sekhar P. Ch.

Annals of Computer Science and Information Systems

|

2018

|

Vol. 14

159--163

EN

As the demand for high performance computing increases, new approaches have to be found to automate the design of embedded processors. Simultaneously, new tools have to be developed to short the execution time consumption, and simpler design resulting in time to market. These are to be applied for the system architecture to achieve rapid exploration in on power consumption, chip area, and performance constraints. This enables interest in Application Specific Instruction Processors (ASIPs) design and application considerably. It has higher flexibility as compared to dedicated hardware. The current case study focuses on an ASIP design methodology considering the classical parameters computational performance and area as well as energy consumption simultaneously. In this paper, the clock gating is analyzed and designed. Further it is optimized using Fast genetic algorithm (FastGA). The optimization result is shown for ICORE (ISS-core) ASIP for DVB-T acquisition and tracking algorithms. Observation shows a potential of about one order of magnitude in savings of energy for optimization.

7

Zwiększanie wydajności mikroprocesorów z wykorzystaniem informacji o otoczeniu

Marzec P., Kos A., Fluder P.

Przegląd Elektrotechniczny

|

2017

|

R. 93, nr 8

39--42

PL

W artykule autorzy przedstawiają system służący do praktycznej realizacji modelu kontroli temperatury układu scalonego uwzględniającego zmienne warunki otoczenia. Proponowany model umożliwia poprawę wydajności obliczeniowej układu scalonego, przy zachowanej stałej temperaturze pracy układu. Przedstawiona została fizyczna realizacja regulatora ΔT oraz wyniki działania systemu w odniesieniu do standardowego rozwiązania.

EN

In the article authors present a system for the practical implementation of the model of temperature control system, which takes into account the changing ambient conditions. The proposed solution improves performance of computing and maintains a constant operating temperature of the integrated circuit (microprocessor). The paper presents a physical realization of the Δt control system and the computation performance of the system in relation to the standard solution.

8

Concept of instruction set driven designing of CPU

Pawłowski M., Skorupski A., Szymański Z., Gracki K.

Elektronika : konstrukcje, technologie, zastosowania

|

2015

|

Vol. 56, nr 10

95-97

EN

The paper presents idea of processors design with a preset instruction list. Each instruction is implemented as a functional logic block, attached to a common bus. Each of these blocks contains execution and control elements necessary to instruction execution. The processor is a combination of several dozen of such blocks. Only one is active after the recognition of the instruction code. The individual command blocks are described in VHDL and whole processor can be built in the FPGA.

PL

W artykule przedstawiono koncepcję projektowania procesorów za pomocą listy rozkazów. Każdy z rozkazów stanowi w pełni funkcjonalny blok logiczny, dołączony do wspólnych magistral i zawierający elementy wykonawcze i sterujące, które są niezbędne do jego wykonania. Procesor jest połączeniem kilkudziesięciu takich bloków, z których tylko jeden podejmuje działanie po rozpoznaniu swojego kodu rozkazu. Procesor jest realizowany w układzie FPGA, dlatego opis poszczególnych bloków rozkazowych jest projektowany w języku VHDL.

9

Nowe podejście do projektowania chłodzenia serwerów

Wesołowski A.

Chłodnictwo i Klimatyzacja

|

2015

|

nr 11

28--32

PL

W niniejszym artykule skupię się na zagadnieniach dotyczących chłodzenia procesorów, które są podstawowym elementem każdego serwera zlokalizowanego w Centrach Przetwarzania Danych. Procesor w czasie swojej pracy wydziela duże ilości ciepła, które należy odprowadzić, aby serwer mógł w miarę efektywnie pracować.

10

Estymacja czasów wykonywania algorytmu sterującego w zależności od platformy sprzętowej na użytek diagnostyki obiektu mechanicznego

Kozłowska A.

Pomiary Automatyka Kontrola

|

2013

|

R. 59, nr 5

466--469

PL

Opracowanie systemów sterowania obiektami mechanicznymi polega na znalezieniu kompromisu między szybkością działania, a wymaganą dokładnością i jest zagadnieniem o dużej złożoności obliczeniowej. W artykule przedstawiono różne implementacje algorytmu Optymalizacji Rojem Cząstek PSO (ang. Particle Swarm Optimization), który stworzono w celu uzyskania minimalnego czasu obróbki przy zachowaniu zadanej dokładności odtwarzania trajektorii ruchu. Jego działanie zostało porównane w językach: C, C++ i C# oraz na procesorze i karcie graficznej. Z przeprowadzonych badań wynika, że dla małej liczby punktów obliczenia na karcie graficznej są wolniejsze niż na procesorze.

EN

: Finding the compromise between speed and accuracy is the most important problem in designing control systems. This is a problem of high computational complexity. The paper presents implementation of the algorithm PSO (Particle Swarm Optimization) whose action has been compared in several programming environments (C / OpenCL and C # / Cloo and in C + +) and hardware platforms (CPU and graphics card processor - GPU). PSO is able to achieve the minimum processing time and best possible mapping of a given trajectory. To compare the speed of the PSO algorithm there was made a measurement of the time of test function minimization. The paper describes three test functions commonly used to test the optimization effectiveness. The results show that for a small number of points the calculations on a graphic card are slower than those performed on the CPU. The appropriate use of available parallel computing technologies can significantly improve the characteristics of a multi-axis machine and the expenses incurred for optimization of the PSO can quickly result in important profits. It should be noted that optimization of the processing speed is most needed where the treatment is most complicated. The profit will be negligible for simple trajectories. In special cases, the optimization may extend the processing time without apparent improvement of the characteristics of trajectory mapping.

11

Zwiększenie rynku użytkowników komputerów Macintosh jako rezultat działań innowacyjnych firmy Apple

Michalski A.

Zeszyty Naukowe. Organizacja i Zarządzanie / Politechnika Śląska

|

2011

|

z. 57

271-280

PL

Firma Apple znana jest ze swoich oryginalnych i ciekawych rozwiązań, jednakże ze względu na brak kompatybilności oprogramowania i formatów plików z komputerami PC segment rynku rodziny Macintosh był stosunkowo niewielki. Artykuł skupia się na analizie przyczyn i skutków innowacyjnej decyzji, związanej ze zmianą typu procesora w komputerach Mac, która zaowocowała zwiększeniem popularności tych komputerów.

EN

The Apple Inc. is widely known for its original and interesting computer's solutions, but for the long time the company shares relatively small market percentage because the laok of the PC compatibility. In the paper the short analysis of both the reasons for the innovative decision to change processor type used in Mac computers and the results of the decision is given.

12

Akceleracja obliczeń komputerowych za pomocą układów graficznych z wykorzystaniem technologii CUDA

Stefanowicz Ł., Wiśniewski R., Wiśniewska M.

Pomiary Automatyka Kontrola

|

2011

|

R. 57, nr 8

954-956

PL

W artykule zaprezentowano możliwość zastosowania układów graficznych celem przyspieszenia obliczeń komputerowych. Przedstawiono technologię oraz architekturę CUDA firmy nVidia, a także podstawowe rozszerzenia względem standardów języka C. W referacie omówiono autorskie algorytmy testowe oraz metodykę badań, które przeprowadzono w celu określenia skuteczności akceleracji obliczeń komputerowych z wykorzystaniem procesorów graficznych GPU w porównaniu do rozwiązań tradycyjnych, opartych o CPU.

EN

The paper deals with application of the graphic processor units (GPUs) to acceleration of computer operations and computations. The traditional computation methods are based on the Central Processor Unit (CPU), which ought to handle all computer operations and tasks. Such a solution is especially not effective in case of distributed systems where some sub-tasks can be performed in parallel. Many parallel threads can accelerate computing, which results in a shorter execution time. In the paper a new CUDA technology and architecture is shown. The presented idea of CUDA technology bases on application of the GPU processors to compu-tation to achieve better performance in comparison with the traditional methods, where CPUs are used. The GPU processors may perform multi-thread calculation. Therefore, especially in case of tasks where concurrency can be applied, CUDA may highly speed-up the computation process. The effectiveness of CUDA technology was verified experimentally. To perform investigations and experiments, the own test modules were used. The library of benchmarks consists of various algorithms, from simple iteration scripts to video processing methods. The results obtained from calculations performed via CPU and via GPU are compared and discussed.

13

Vector calculations using the x86 processors family

Raczyński D.

Zeszyty Naukowe. Elektryka / Politechnika Opolska

|

2010

|

z. 63

49-50

EN

The purpose of this article is to present basic features of vector extensions introduced in the x86 family of processors. In order to compare the speed of programs using the vector extensions with those using scalar data, a few programs are developed, in particular programs performing operations on graphical BMP files, counting the indicated integral using the rectangles method and generating fractals.

14

Nierównomierne obciążenie procesorów w systemie wieloprocesorowym

Taborek K., Pogoda Z.

Elektronika : konstrukcje, technologie, zastosowania

|

2009

|

Vol. 50, nr 10

60-63

PL

W artykule przedstawiono przykład nierównomiernego obciążenia procesorów w systemie wieloprocesorowym ze wspólną pamięcią. Nieregularne obciążenie procesorów jest rozumiane w sensie różnej liczby zgłoszeń tych procesorów do pamięci globalnej oraz różnych intensywności tych zgłoszeń. Został zaproponowany bardzo użyteczny przypadek obciążenia nierównomiernego, którego zastosowanie w systemie wieloprocesorowym znacznie upraszcza analizę wydajności takiego systemu. Przedstawiono programową metodę generacji zgłoszeń procesorów w rzeczywistym systemie wieloprocesorowym. Zostały przedstawione schematy blokowe dwóch typów programów: dla procesora master i dla procesora s!ave.

EN

An example of irregular load of processors in multiprocessor system with common memory was presented in this paper. The irregular load of processors is meant as different numbers of requests of these processors to the global memory. Additionally, intensities of these requests must be different, too. A very useful event of the irregular load of processors was proposed. Application of this kind of load in multiprocessor system causes that performance analyse of this system is easier. Programmed method of generation of requests of processors in the real multiprocessor system was presented. Block diagrams of two types of programs: for master processor and for slave processor - were shown in figures.

15

Heat transfer from a pc processor to an air-cooled heat sink

Lemanowicz M., Wójcik J.

Chemical and Process Engineering

|

2007

|

Vol. 28, z. 1

57-66

EN

Heat transfer from PC processors to radial heat sinks cooled by air has been examined. The dimensionless equations derived could be used for the prediction of the heat transfer coefficient for the system. The influence of processor temperature on the operation of the fan is demonstrated.

PL

Przedstawiono wyniki badań odbioru ciepła z procesora komputera PC za pomocą radiatora promieniowego chłodzonego powietrzem. Otrzymane równania kryterialne mogą służyć do obliczenia współczynnika wnikania ciepła dla takiego układu. Pokazano, jak temperatura procesora wpływa na pracę wentylatora.

16

Optymalizacja oprogramowania przetwarzającego sygnały cyfrowe na procesorach PowerPC

Rosłoniec W.

Prace Przemysłowego Instytutu Telekomunikacji

|

2006

|

Supl. Nr 22

3-34

PL

Artykuł poświęcony jest omówieniu sposobów zwiększenia szybkości przetwarzania sygnałow przez systemy procesorowe wykorzystujące procesory PowerPC. W artykule omówiono wybrane techniki optymalizacji kodu programu pozwalające usprawnić współpracę jednostek wykonawczych procesora z podsystemem pamięci, oraz przedstawiono metody wykorzystania jednostek wektorowych AltiVec do przetwarzania sygnałów cyfrowych.

17

Układowa implementacja języków wysokiego poziomu

Komorowski W.

Informatyka

|

1998

|

nr 7/8

17-24

PL

Wiele współczesnych maszyn nosi w sobie ślady poprzednich, często prymitywnych wcieleń, które przetrwały czasem w postaci szczątkowej dzięki dążeniu producentów do zachowania kompatybilności i w efekcie składają się na eklektyczną architekturę szczególnie dokuczliwą na przykład przy programowaniu w Asemblerze.