Tytuł artykułu
Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
With the growing demand for computing power, new multicore architectures have emerged to provide better performance. Reducing their energy consumption is one of the main challenges in achieving high performance computing. Current research trends develop new software and hardware techniques to achieve the best performance and energy compromise. In this work, we investigate the effect of processor frequency scaling using Dynamic Voltage Frequency Scaling on performance and energy consumption for the WZ factorization. This factorization is implemented both without optimization techniques and with strip mining. This technique involves transforming the program loop to improve program performance. Based on time and energy tests, we have shown that for the WZ factorization algorithm, regardless of the presence of manual optimization, it pays to reduce the frequency to save energy without losing performance. The conclusion can be extended to analogous algorithms - also having a high ratio of memory access to computational operations.
Rocznik
Tom
Strony
377--383
Opis fizyczny
Bibliogr. 24 poz., wykr., tab., il.
Twórcy
autor
- Institute of Computer Science, Marie Curie-Sklodowska University Pl. M. Curie-Skłodowskiej 5 Lublin, 20-031, Poland
autor
- Institute of Computer Science, Marie Curie-Sklodowska University Pl. M. Curie-Skłodowskiej 5 Lublin, 20-031, Poland
autor
- Institute of Computer Science, Marie Curie-Sklodowska University Pl. M. Curie-Skłodowskiej 5 Lublin, 20-031, Poland
Bibliografia
- [1] “Top500,” https://www.top500.org/, 2022.
- [2] J. J. Dongarra, C. B. Moler, J. R. Bunch, and G. W. Stewart, LINPACK users’ guide. : SIAM, 1979.
- [3] “Green500,” https://www.top500.org/lists/green500/, 2022.
- [4] J. V. Lima, I. Raïs, L. Lefevre, and T. Gautier, “Performance and energy analysis of openmp runtime systems with dense linear algebra algorithms,” in 2017 International Symposium on Computer Architecture and High Performance Computing Workshops (SBAC-PADW), 2017, pp. 7–12.
- [5] M. Mirka, G. Devic, F. Bruguier, G. Sassatelli, and A. Gamatié, “Automatic energy-efficiency monitoring of openmp workloads,” in 2019 14th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), 2019, pp. 43–50.
- [6] M. A. Shahneous Bari, M. M. Abid, A. Qawasmeh, and B. Chapman, “Performance and energy impact of openmp runtime configurations on power constrained systems,” Sustainable Computing: Informatics and Systems, vol. 23, pp. 1–12, 2019.
- [7] J. V. F. Lima, I. Raïs, L. Lefèvre, and T. Gautier, “Performance and energy analysis of OpenMP runtime systems with dense linear algebra algorithms,” The International Journal of High Performance Computing Applications, vol. 33, no. 3, pp. 431–443, 2019.
- [8] J. Dongarra, H. Ltaief, P. Luszczek, and V. M. Weaver, “Energy footprint of advanced dense numerical linear algebra using tile algorithms on multicore architectures,” in 2012 Second International Conference on Cloud and Green Computing, 2012, pp. 274–281.
- [9] L. Szustak, R. Wyrzykowski, T. Olas, and V. Mele, “Correlation of performance optimizations and energy consumption for stencil-based application on Intel Xeon scalable processors,” IEEE Transactions on Parallel and Distributed Systems, vol. 31, no. 11, pp. 2582–2593, 2020.
- [10] T. Jakobs and G. Rünger, “Examining energy efficiency of vectorization techniques using a Gaussian elimination,” in 2018 International Conference on High Performance Computing Simulation (HPCS), 2018, pp. 268–275.
- [11] A. Shahid, S. Arif, M. Qadri, and S. Munawar, “Power optimization using clock gating and power gating: A review,” in Innovative Research and Applications in Next-Generation High Performance Computing, Q. F. Hassan, Ed. : IGI Global, 2016.
- [12] M. Weiser, B. Welch, A. Demers, and S. Shenker, “Scheduling for reduced cpu energy,” 1st OSDI, pp. 13–23, 1994.
- [13] W. A. and F. Bellosa, “Process cruise control - event-driven clock scaling for dynamic power management,” CASES, 2002.
- [14] R. C. Whaley, A. Petitet, and J. J. Dongarra, “Automated empirical optimizations of software and the ATLAS project,” Parallel Comput., vol. 27, no. 1-2, pp. 3–35, 2001. [Online]. Available: https: //doi.org/10.1016/S0167-8191(00)00087-9
- [15] T. Jakobs, J. Lang, G. Rünger, and P. Stocker, “Tuning linear algebra for energy efficiency on multicore machines by adapting the ATLAS library,” Future Gener. Comput. Syst., vol. 82, pp. 555–564, 2018. Online]. Available: https://doi.org/10.1016/j.future.2017.03.009
- [16] E. Garcia, J. Arteaga, R. S. Pavel, and G. R. Gao, “Optimizing the lu factorization for energy efficiency on a many-core architecture,” in International Workshop on Languages and Compilers for Parallel Computing, 2013. [Online]. Available: https://api.semanticscholar.org/CorpusID:489258
- [17] S. Donfack, J. Dongarra, M. Faverge, M. Gates, J. Kurzak, P. Luszczek, and I. Yamazaki, “A survey of recent developments in parallel implementations of Gaussian elimination,” Concurrency and Computation: Practice and Experience, vol. 27, no. 5, pp. 1292–1309, 2015.
- [18] J. J. Dongarra, M. Faverge, H. Ltaief, and P. Luszczek, “Achieving numerical accuracy and high performance using recursive tile LU factorization,” Concurrency and Computation: Practice and Experience, vol. 26, no. 6, pp. 1408–1431, 2013.
- [19] B. Bylina and J. Bylina, “Nested loop transformations on multi- and many-core computers with shared memory,” in Selected Topics in Applied Computer Science, J. Bylina, Ed. Lublin: Maria Curie-Skłodowska University Press, 2021, pp. 167–186, http://stacs.matrix.umcs.pl/v01/stacs_v01.pdf.
- [20] R. Chandra, L. Dagum, D. Kohr, D. Maydan, R. Menon, and J. Mc-Donald, Parallel Programming in OpenMP. San Francisco: Morgan Kaufmann Publishers, 2001.
- [21] J. Bylina, B. Bylina, and M. Piekarz, “Influence of loop transformations on performance and energy consumption of the multithreded wz factorization,” Preproceedings of the of the 17th Conference on Computer Science and Intelligence Systems, pp. 479–488, 2022, https://annals-csis.org/proceedings/2022/pliks/251.pdf.
- [22] E. Rotem, A. Mendelson, A. Naveh, and M. Moffie, “Analysis of the enhanced intel® speedstep® technology of the pentium® m processor,” https://www.cs.virginia.edu/~skadron/tacs/rotem\_slides.pdf, 2004.
- [23] “Amd powernow! technology dynamically manages powerand performance,” https://www.amd.com/system/files/TechDocs/24404a.pdf, 2000.
- [24] K. De Vogeleer, G. Memmi, P. Jouvelot, and F. Coelho, “The energy/frequency convexity rule:modeling and experimental validation onmobile devices,” PPAM’2013, 2014.
- [25] D. Evans and M. Hatzopoulos, “A parallel linear system solver,” International Journal of Computer Mathematics, vol. 7, no. 3, pp. 227–238, 1979.
- [26] P. Yalamov and D. Evans, “The wz matrix factorisation method,” Parallel Computing, vol. 21, no. 7, pp. 1111–1120, 1995.
- [27] K. Khan, M. Hirki, T. Niemi, J. Nurminen, and Z. Ou, “RAPL in action: Experiences in using RAPL for power measurements,” ACM Transactions on Modeling and Performance Evaluation of Computing Systems (TOMPECS), vol. 3, 2018.
- [28] L. Szustak, R. Wyrzykowski, T. Olas, and V. Mele, “Correlation of performance optimizations and energy consumption for stencil-based application on Intel Xeon scalable processors,” IEEE Transactions on Parallel and Distributed Systems, vol. 31, no. 11, pp. 2582–2593, 2020.
- [29] B. Bylina, J. Potiopa, M. Klisowski, and J. Bylina, “The impact of vectorization and parallelization of the slope algorithm on performance and energy efficiency on multi-core architecture,” Annals of Computer Science and Information Systems, vol. 25, pp. 2283–290, 2021
Uwagi
1. Thematic Tracks Regular Papers
2. Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2024).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-5bff5ad2-a80e-49b5-acdc-583d9030137b