Analiza efektywności procesów ETL realizowanych z użyciem języków SQL i Apache HiveQL

Litka, Krzysztof

doi:10.35784/jcsi.3674

Artykuł - szczegóły

Tytuł artykułu

Analiza efektywności procesów ETL realizowanych z użyciem języków SQL i Apache HiveQL

Autorzy

Litka Krzysztof

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.35784/jcsi.3674

Warianty tytułu

Analyze the effectiveness of ETL processes implemented using SQL and Apache HiveQL languages

Języki publikacji

Abstrakty

W dobie cyfryzacji, gdzie dane gromadzone są w coraz większych ilościach, wymagane jest ich efektywne przetwarzanie. W artykule dokonano analizy wydajności języka SQL i HiveQL, dla scenariuszy o zróżnicowanym stopniu złożoności, skupiając się na czasie wykonania poszczególnych zapytań. Omówiono także wykorzystane w badaniu narzędzia. Wyniki badań dla poszczególnych języków zostały zestawione i porównane, podkreślając ich mocne i słabe strony, a akże określając ich możliwe obszary zastosowań.

In the era of digitization, where data is collected in ever-increasing quantities, efficient processing is required. The article analyzes the performance of SQL and HiveQL, for scenarios of varying complexity, focusing on the execution time of individual queries. The tools used in the study are also discussed. The results of the study for each language are summarized and compared, highlighting their strengths and weaknesses, as well as identifying their possible areas of application.

Słowa kluczowe

ETL SQL HiveQL

Wydawca

Wydawnictwo Politechniki Lubelskiej

Czasopismo

Journal of Computer Sciences Institute

Rocznik

2023

Tom

Vol. 28

Strony

204--209

Opis fizyczny

Bibliogr. 14 poz., rys.

Twórcy

autor

Litka Krzysztof

s99174@pollub.edu.pl

Lublin University of Technology (Poland)

Bibliografia

1. E. Capriolo, D. Wampler, J. Rutherglen, Programming Hive: Data Warehouse and Query Language for Hadoop, O'Reilly Media, 1st edition, 2012.
2. J. Caserta, R. Kimball, The Data Warehouse ETL Toolkit., Wiley, 2004.
3. Cloudera Data Platform, https://www.cloudera.com/products/cloudera-data-platform.html, [25.05.2023].
4. J. Dean, S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, Communications of the ACM 51(1) (2008) 107-113, https://doi.org/10.1145/1327452.1327492.DOI: https://doi.org/10.1145/1327452.1327492
5. B. Karwin, SQL Antipatterns: Avoiding the Pitfalls of Database Programming, Pragmatic Programmers LLC, The 1st edition 2017.
6. P. Mellor, SQL and Relational Theory: How to Write Accurate SQL Code, O'Reilly Media Inc., 2011.
7. B. Oliveira, O. Belo, J. Caldeira, A Systematic Literature Review on Big Data Extraction, Transformation and Loading (ETL), Proceedings of the 2021 Computing Conference Volume 2 held virtually (2021) 308-324, https://doi.org/10.1007/978-3-030-80126-7_24.DOI: https://doi.org/10.1007/978-3-030-80126-7_24
8. A. Pelikant, Hurtownie danych. Od przetwarzania anali-tycznego do raportowania, Wydanie II, Helion, 2021.
9. A. Simitsis, P. Vassiliadis, T. Sellis, Optimizing ETL processes in data warehouses, 21st International Confer-ence on Data Engineering (ICDE'05), Tokyo, Japan (2005) 564-575, https://doi.org/10.1109/ICDE.2005.103.DOI: https://doi.org/10.1109/ICDE.2005.103
10. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, N. Zhang, Hive - a Petabyte Scale Data Warehouse using Hadoop, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), Long Beach, CA USA (2010) 996-1005, https://doi.org/10.1109/ICDE.2010.5447738.DOI: https://doi.org/10.1109/ICDE.2010.5447738
11. A. Thusoo, J. S. Sarma, N. Jain, Z. Shao, P. Chakka, S. Anthony, H. Liu, P. Wyckoff, R. Murthy, Hive: a ware-housing solution over a map-reduce framework, Proceed-ings of the VLDB Endowment 2(2) (2009) 1626–1629, https://doi.org/10.14778/1687553.1687609.DOI: https://doi.org/10.14778/1687553.1687609
12. T. White, Hadoop: The definitive guide, O'Reilly Media Inc., 2012.
13. P. C. Zikopoulos, C. Eaton, Understanding big data: Analytics for enterprise class Hadoop and streaming data, McGraw-Hill Osborne Media, 2011.
14. N. Ahmed, S. Ahamed, J. I. Rahim, Data Processing in Hive vs. SQL Server: A comparative analysis in the query performance, 2017 IEEE 3rd International Conference on Engineering Technologies and Social Sciences, Bangkok, Thailand (2017) 1-5, https://doi.org/10.1109/icetss.2017.8324202.DOI: https://doi.org/10.1109/ICETSS.2017.8324202

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa nr POPUL/SP/0154/2024/02 w ramach programu "Społeczna odpowiedzialność nauki II" - moduł: Popularyzacja nauki (2025).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-05c21d55-8720-4f97-8242-d74ffe77583b