Powiadomienia systemowe
- Sesja wygasła!
Tytuł artykułu
Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
In a data warehouse architecture, heterogeneous and distributed data sources (DSs) are integrated by means of an extract-transform-load (ETL) layer, which runs integration processes (a.k.a. ETL processes). This layer is not static, since DSs being integrated change their schemas in time. A DS schema change impacts ETL processes, which typically stop working and need to be re-designed (i.e., repaired). Our overall goal is to repair automatically these ETL processes that were affected by DS schema changes. In this paper we focus on ETL processes specified by extended relational algebra, since relational data warehouses are among the most popular for business applications. For such a processes, we contribute a repair method. The method uses a rule engine that maps a possible DS schema change with: (1) an ETL operation on the changed schema element and with (2) a repair rule applicable if a DS schema element is changed. Based on this mapping, when a DS schema change occurs, our solution allows to apply adequate ETL rules to repair the affected ETL processes.
Rocznik
Tom
Strony
157--190
Opis fizyczny
Bibliogr. 60 poz., rys., tab.
Twórcy
autor
- Université libre de Bruxelles, Belgium
autor
- Poznan University of Technology, Poland
autor
- Université libre de Bruxelles, Belgium
Bibliografia
- [1] Ali S. M. F. and Wrembel R. From conceptual design to performance optimization of ETL workflows: current state of research and open problems. International Journal on Very Large Data Bases (VLDB), 26(6): 777-801, 2017.
- [2] Allen M. and Cervo D. Multi-domain master data management: Advanced MDM and data governance in practice. Morgan Kaufmann, 2015.
- [3] Awiti J. Algorithms and architecture for managing evolving ETL workflows. In Proc. of ADBIS Workshops, volume 1064 of CCIS, pages 539-545. Springer, 2019.
- [4] Awiti J., Vaisman A. A., and Zimányi E. From conceptual to logical ETL design using BPMN and relational algebra. In International Conference on Big Data Analytics and Knowledge Discovery (DaWaK), volume 11708 of LNCS, pages 299-309. Springer, 2019.
- [5] Awiti J., Vaisman A. A., and Zimányi E. Design and implementation of ETL processes using BPMN and relational algebra. Data & Knowledge Engineering (DKE), 129: 101837, 2020.
- [6] Awiti J. and Wrembel R. Rule discovery for (semi-)automatic repairs of ETL processes. In International Baltic Conference on Databases and Information Systems (DB&IS), volume 1243 of CCIS, pages 250-264. Springer, 2020.
- [7] Awiti J. and Zimányi E. An XML interchange format for ETL models. In New Trends in Databases and Information Systems (ADBIS) Workshops, volume 1064 of CCIS, pages 427-439. Springer, 2019.
- [8] Balmin A., Papadimitriou T., and Papakonstantinou Y. Hypothetical queries in an OLAP environment. In International Conference on Very Large Data Bases (VLDB), pages 220-231, 2000.
- [9] Bellahsene Z. View adaptation in data warehousing systems. In International Conference on Database and Expert Systems Applications (DEXA), pages 300-309. LNCS 1460, 1998.
- [10] Blaschka M., Sapia C., and Hofling G. On schema evolution in multidimensional databases. In International Conference on Big Data Analytics and Knowledge Discovery (DaWaK), pages 153-164. LNCS 1676, 1999.
- [11] Body M., Miquel M., Bédard Y., and Tchounikine A. A multidimensional and multiversion structure for OLAP applications. In International Workshop on Data Warehousing and OLAP (DOLAP), pages 1-6, 2002.
- [12] Bodziony M., Krzyzanowski H., Pieta L., and Wrembel R. On discovering semantics of user-defined functions in data processing workflows. In International Workshop on Big Data in Emergent Distributed Environments (BiDEDE) @ ACM SIGMOD/PODS Conference. ACM, 2021.
- [13] Butkevicius D., Freiberger P. D., and Halberg F. M. Maime: a maintenance manager for ETL processes. In Workshops @ EDBT/ICDT Joint Conference, volume 1810 of CEUR Workshop Proceedings. CEUR-WS.org, 2017.
- [14] Chamoni P. and Stock S. Temporal structures in data warehousing. In International Conference on Big Data Analytics and Knowledge Discovery (DaWaK), pages 353-358. LNCS 1676, 1999.
- [15] Cleve A., Gobert M., Meurice L., Maes J., and Weber J. Understanding database schema evolution: A case study. Science of Computer Programming, 97: 113-121, 2015.
- [16] Curino C., Moon H. J., Tanca L., and Zaniolo C. Schema evolution in Wikipedia - toward a web information system benchmark. In International Conference on Enterprise Information Systems (ICEIS), pages 323-332, 2008.
- [17] Delplanque J., Etien A., Anquetil N., and Auverlot O. Relational database schema evolution: An industrial case study. In International Conference on Software Maintenance and Evolution (ICSME), pages 635-644. IEEE, 2018.
- [18] Dimolikas K., Zarras A. V., and Vassiliadis P. A study on the effect of a table’s involvement in foreign keys to its schema evolution. In International Conference on Conceptual Modeling ER, volume 12400 of LNCS, pages 456-470. Springer, 2020.
- [19] Eder J. and Koncilia C. Changes of dimension data in temporal data warehouses. In International Conference on Big Data Analytics and Knowledge Discovery (DaWaK), pages 284-293. LNCS 2114, 2001.
- [20] Eder J., Koncilia C., and Morzy T. The COMET metamodel for temporal data warehouses. In International Conference on Advanced Information Systems Engineering (CAISE), pages 83-99. LNCS 2348, 2002.
- [21] Elmasri R. and Navathe S. B. Fundamentals of Database Systems, 7th Edition. Pearson, 2016.
- [22] Giachos F., Pantelidis N., Batsilas C., Zarras A. V., and Vassiliadis P. Parallel lives diagrams for co-evolving communities and their application to schema evolution. In Companion Proceedings of the International Conference on Conceptual Modeling: ER Forum, volume 3618 of CEUR Workshop Proceedings. CEUR-WS.org, 2023.
- [23] Golfarelli M., Lechtenbörger J., Rizzi S., and Vossen G. Schema versioning in data warehouses. In ER 2004 Workshops, pages 415-428. LNCS 3289, 2004.
- [24] Gorawski M. and Marks P. Resumption of data extraction process in parallel data warehouses. In International Conference Parallel Processing and Applied Mathematics (PPAM), volume 3911 of LNCS, pages 478-485. Springer, 2005.
- [25] Gorawski M. and Marks P. Checkpoint-based resumption in data warehouses. In Software Engineering Techniques: Design for Quality (SET), volume 227 of IFIP, pages 313-323. Springer, 2006.
- [26] Hai R., Koutras C., Quix C., and Jarke M. Data lakes: A survey of functions and systems. IEEE Transactions on Knowledge and Data Engineering, 35(12): 12571-12590, 2023.
- [27] Herrmann K., Voigt H., Behrend A., Rausch J., and Lehner W. Living in parallel realities: Co-existing schema versions with a bidirectional database evolution language. In ACM International Conference on Management of Data (SIGMOD), pages 1101-1116. ACM, 2017.
- [28] Herrmann K., Voigt H., Pedersen T. B., and Lehner W. Multi-schema-version data management: data independence in the twenty-first century. International Journal on Very Large Data Bases (VLDB), 27: 547-571, 2018.
- [29] Herrmann K., Voigt H., Rausch J., Behrend A., and Lehner W. Robust and simple database evolution. Information Systems Frontiers, 20:45-61, 2018.
- [30] Huang J. and Guo C. An mas-based and fault-tolerant distributed ETL workflow engine. In IEEE International Conference on Computer Supported Cooperative (CSCWD), pages 54-58. IEEE, 2012.
- [31] Hurtado C. A., Mendelzon A. O., and Vaisman A. A. Maintaining data cubes under dimension updates. In International Conference on Data Engineering (ICDE), pages 346-355. IEEE, 1999.
- [32] Hyun S. and Hurtado J. A. Traceability of architectural design decisions and software artifacts: A systematic mapping study. Foundations of Computing and Decision Sciences (FCDS), 48(4): 401-423, 2023.
- [33] Kaas C. K., Pedersen T. B., and Rasmussen B. D. Schema evolution for stars and snowflakes. In International Conference on Enterprise Information Systems (ICEIS), pages 425-433, 2004.
- [34] Labio W., Wiener J. L., Garcia-Molina H., and Gorelik V. Efficient resumption of interrupted warehouse loads. In ACM SIGMOD International Conference on Management of Data, pages 46-57. ACM, 2000.
- [35] Manousis P., Vassiliadis P., and Papastefanatos G. Automating the adaptation of evolving data-intensive ecosystems. In International Conference on Conceptual Modeling (ER), volume 8217 of LNCS, pages 182-196, 2013.
- [36] Manousis P., Vassiliadis P., and Papastefanatos G. Impact analysis and policy-conforming rewriting of evolving data-intensive ecosystems. Journal on Data Semantics, 4(4): 231-267, 2015.
- [37] Mendelzon A. O. and Vaisman A. A. Temporal queries in OLAP. In International Conference on Very Large Data Bases (VLDB), pages 242-253, 2000.
- [38] Moon H. J., Curino C., Deutsch A., Hou C., and Zaniolo C. Managing and querying transaction-time databases under schema evolution. Proceedings of the VLDB Endowment, 1(1): 882-895, 2008.
- [39] Papastefanatos G., Vassiliadis P., Simitsis A., Sellis T., and Vassiliou Y. Rule-based management of schema changes at ETL sources. In European Conference on Advances in Databases and Information Systems (ADBIS), volume 5968 of LNCS, pages 55-62. Springer, 2010.
- [40] Papastefanatos G., Vassiliadis P., Simitsis A., and Vassiliou Y. Policy-regulated management of ETL evolution. Journal on Data Semantics, 5530: 147-177, 2009.
- [41] Papastefanatos G., Vassiliadis P., Simitsis A., and Vassiliou Y. Metrics for the prediction of evolution impact in ETL ecosystems: A case study. Journal on Data Semantics, 1: 75-97, 2012.
- [42] Popovic A., Ivkovic V., Trajkovic N., and Lukovic I. A domain-specific language for managing ETL processes. PeerJ Computer Science, 10:e1835, 2024.
- [43] Qiu D., Li B., and Su Z. An empirical analysis of the co-evolution of schema and code in database applications. In Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering, pages 125-135. ACM, 2013.
- [44] Ravat F., Teste O., and Zurfluh G. A multiversion-based multidimensional model. In International Conference on Big Data Analytics and Knowledge Discovery (DaWaK), pages 65-74. LNCS 4081, 2006.
- [45] Romero O. and Wrembel R. Data engineering for data science: Two sides of the same coin. In International Conference on Big Data Analytics and Knowledge Discovery (DaWaK), volume 12393 of LNCS. Springer, 2020.
- [46] Rundensteiner E. A., Koeller A., Zhang X., Lee A. J., Nica A., Van Wyk A., and Lee Y. Evolvable view environment (EVE): Non-equivalent view maintenance under schema changes. In International Conference on Management of Data (SIGMOD), pages 553-555, 1999.
- [47] Schlesinger L., Bauer A., Lehner W., Ediberidze G., and Gutzman M. Efficiently synchronizing multidimensional schema data. In International Workshop on Data Warehousing and OLAP (DOLAP), pages 69-76, 2001.
- [48] Tu S. and Zhu L. An optimized etl fault-tolerant algorithm in data warehouses. In IEEE International Conference on Information Science and Technology (ICIST), pages 484-487, 2013.
- [49] Vaisman A. and Mendelzon A. A temporal query language for OLAP: Implementation and case study. In Database Programming Languages (DBPL), pages 78-96. LNCS 2397, 2001.
- [50] Vaisman A. A. and Zimányi E. Data Warehouse Systems - Design and Implementation, Second Edition. Data-Centric Systems and Applications. Springer, 2022.
- [51] Vassiliadis P. Profiles of schema evolution in free open source software projects. In International Conference on Data Engineering (ICDE), pages 1-12. IEEE, 2021.
- [52] Vassiliadis P. and Kalampokis G. Taxa and super taxa of schema evolution and their relationship to activity, heartbeat and duration. Information Systems, 110: 102109, 2022.
- [53] Vassiliadis P., Kolozoff M., Zerva M., and Zarras A. V. Schema evolution and foreign keys: a study on usage, heartbeat of change and relationship of foreign keys to table activity. Computing, 101: 1431-1456, 2019.
- [54] Vassiliadis P., Shehaj F., Kalampokis G., and Zarras A. V. Joint source and schema evolution: Insights from a study of 195 FOSS projects. In International Conference on Extending Database Technology (EDBT), pages 27-39. OpenProceedings.org, 2023.
- [55] Vassiliadis P. and Zarras A. V. Schema evolution survival guide for tables: Avoid rigid childhood and you’re en route to a quiet life. Journal on Data Semantics, 6: 221-241, 2017.
- [56] Vassiliadis P., Zarras A. V., and Skoulis I. Gravitating to rigidity: Patterns of schema evolution – and its absence - in the lives of tables. Information Systems, 63: 24-46, 2017.
- [57] Wojciechowski A. ETL workflow reparation by means of case-based reasoning. Information Systems Frontiers, 20(1): 21-43, 2018.
- [58] Wojciechowski A. and Wrembel R. On case-based reasoning for ETL process repairs: Making cases fine-grained. In International Conference on Databases and Information Systems (DB&IS), volume 1243 of CCIS, pages 235-249. Springer, 2020.
- [59] Wrembel R. On handling the evolution of external data sources in a data warehouse architecture. In Integrations of Data Warehousing, Data Mining and Database Technologies - Innovative Approaches, pages 106-147. Information Science Reference, 2011.
- [60] Wu S. and Neamtiu I. Schema evolution analysis for embedded databases. In Workshops @ International Conference on Data Engineering (ICDE), pages 151-156. IEEE, 2011.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-73c8476d-4ae8-4fd5-aecb-17031145423a
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.