Wyniki wyszukiwania - BazTech

1

The scalability in terms of the time and the energy for several matrix factorizations on a multicore machine

Bylina Beata, Piekarz Monika

Annals of Computer Science and Information Systems

|

2023

|

Vol. 35

895--900

EN

Scalability is an important aspect related to time and energy savings on modern multicore architectures. In this paper, we investigate and analyze scalability in terms of time and energy. We compare the execution time and consumption energy of the LU factorization (without pivoting) and Cholesky, both with Math Kernel Library (MKL) on a multicore machine. In order to save the energy of these multithreaded factorizations, the dynamic voltage and frequency scaling (DVFS) technique was used. This technique allows the clock frequency to be scaled without changing the implementation. An experimental scalability evaluation was performed on an Intel Xeon Gold multicore machine, depending on the number of threads and the clock frequency. Our test results show that scalability in terms of the execution time expressed by the Speedup metric has values close to a linear function with an increase in the number of threads. In contrast, scalability in terms of the energy consumed expressed by the Greenup metric has values close to a logarithmic function with an increase in the number of threads. Both kinds of scalability depend on the clock frequency settings and the number of threads.

2

Impact of processor frequency scaling on performance and energy consumption for WZ factorization on multicore architecture

Bylina Beata, Bylina Jarosław, Piekarz Monika

Annals of Computer Science and Information Systems

|

2023

|

Vol. 35

377--383

EN

With the growing demand for computing power, new multicore architectures have emerged to provide better performance. Reducing their energy consumption is one of the main challenges in achieving high performance computing. Current research trends develop new software and hardware techniques to achieve the best performance and energy compromise. In this work, we investigate the effect of processor frequency scaling using Dynamic Voltage Frequency Scaling on performance and energy consumption for the WZ factorization. This factorization is implemented both without optimization techniques and with strip mining. This technique involves transforming the program loop to improve program performance. Based on time and energy tests, we have shown that for the WZ factorization algorithm, regardless of the presence of manual optimization, it pays to reduce the frequency to save energy without losing performance. The conclusion can be extended to analogous algorithms - also having a high ratio of memory access to computational operations.

3

Influence of loop transformations on performance and energy consumption of the multithreded WZ factorization

Bylina Beata, Bylina Jarosław, Piekarz Monika

Annals of Computer Science and Information Systems

|

2022

|

Vol. 30

479--488

EN

High-level loop transformations are a key instrument to effectively exploit the resource in modern architectures. Energy consumption on multi-core architectures is one of the major issues connected with high-performance computing. We examine the impact of four loop transformation strategies on performance and energy consumption. The investigated strategies include: loop fission, loop interchange (permutation), strip-mining, and loop tiling. Additionally, a column-wise and row-wise store formats for dense matrices are considered. Parallelization and vectorization are implemented using OpenMP directives. As a test, the WZ factorization algorithm is used. The comparison of selected strategies of the loop transformation is done for Intel architecture, namely Cascade Lake. It has been shown that for WZ factorization, which is an example of an application in which we can use the loop transformation, optimization towards high-performance can also be an effective strategy for improving energy efficiency. Our results show also that block size selection in loop tilling has a significant impact on energy consumption.

4

The impact of vectorization and parallelization of the slope algorithm on performance and energy efficiency on multi-core architecture

Bylina Beata, Potiopa Joanna, Klisowski Michał, Bylina Jarosław

Annals of Computer Science and Information Systems

|

2021

|

Vol. 25

283--290

EN

Calculation of land-surface parameters (e.g. slope, aspect, curvature) is an important part of many geospatial analyses. Current research trends are aimed at developing new software techniques to achieve the best performance and energy trade-off. In our work, we concentrate on the vectorization and parallelization to improve overall energy efficiency and performance of the neighborhood raster algorithms for the computation of land-surface parameters. We chose the slope calculation algorithm as the basis for our investigation. The parallelization was achieved through redesigning the the original sequential code with OpenMP SIMD vectorization hints for compiler, OpenMP loop parallelization, and the hybrid of these techniques. To evaluate both performance and energy savings, we tested our vector-parallel implementations on a multi-core computer for various data sizes. RAPL interface was used to measure energy consumption. The results showed that optimization towards high performance can also be an effective strategy for improving energy efficiency.

5

The parallel tiled WZ factorization algorithm for multicore architectures

Bylina Beata, Bylina Jarosław

International Journal of Applied Mathematics and Computer Science

|

2019

|

Vol. 29, no. 2

407--419

EN

The aim of this paper is to investigate dense linear algebra algorithms on shared memory multicore architectures. The design and implementation of a parallel tiled WZ factorization algorithm which can fully exploit such architectures are presented. Three parallel implementations of the algorithm are studied. The first one relies only on exploiting multithreaded BLAS (basic linear algebra subprograms) operations. The second implementation, except for BLAS operations, employs the OpenMP standard to use the loop-level parallelism. The third implementation, except for BLAS operations, employs the OpenMP task directive with the depend clause. We report the computational performance and the speedup of the parallel tiled WZ factorization algorithm on shared memory multicore architectures for dense square diagonally dominant matrices. Then we compare our parallel implementations with the respective LU factorization from a vendor implemented LAPACK library. We also analyze the numerical accuracy. Two of our implementations can be achieved with near maximal theoretical speedup implied by Amdahl’s law.