PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Implementation of the Concept of a Repository for Automated Processing of Semi-Structural Data

Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Semi-structural data tend to be problematic due to the sparsity of their attributes and due to the fact that, regardless of their type, they are immensely diverse. This means that data storage is a challenge, especially when the data contained within a relational database – often a strict requirement defined in advance. In this paper, we present a thoroughly described concept of a repository that is capable of storing and processing semi-structural data. Based on this concept, we establish a database model comprising the architecture and the tools needed to search the data and build relevant processors. The processor described may assign roles and dispatch tasks between the users. We demonstrate how the capacities of this repository are capable of overcoming current limitations by creating a system for facilitated digitization of scientific resources. In addition, we show that the repository in question is suitable for general use, and, as such, may be adapted to any domains in which semi-structural data are processed, without any additional work required.
Słowa kluczowe
Rocznik
Tom
Strony
76--86
Opis fizyczny
Bibliogr. 44 poz., rys.
Twórcy
  • AGH University of Science and Technology, Mickiewicza 30, 30-059 Cracow, Poland
  • AGH University of Science and Technology, Mickiewicza 30, 30-059 Cracow, Poland
autor
  • AGH University of Science and Technology, Mickiewicza 30, 30-059 Cracow, Poland
  • AGH University of Science and Technology, Mickiewicza 30, 30-059 Cracow, Poland
Bibliografia
  • [1] J. McHugh, S. Abiteboul, R. Goldman, D. Quass, and J. Widom, „Lore: A database management system for semistructured data", ACM SIGMOD Rec., vol. 26, no. 3, pp. 54-66, 1997 (doi: 10.1145/262762.262770).
  • [2] R. Goldman, J. McHugh, and J. Widom, „From semistructured data to XML: Migrating the Lore data model and query language", in Proc. of the 2nd Int. Worksh. on the Web and Databases WebDB'99, Philadelphia, PA, USA, 1999 [Online]. Available: http://infolab.stanford.edu/lore/pubs/xml.pdf
  • [3] J. Shanmugasundaram, K. Tufte, G. He, C. Zhang, D. DeWitt, and J. Naughton, „Relational databases for querying XML documents: Limitations and opportunities", in Proc. of the 25th Int. Conf. on Very Large Data Bases VLDB'99, Edinburgh, Scotland, 2008, pp. 302-314 [Online]. Available: http://www.vldb.org/conf/1999/P31.pdf
  • [4] M. Rys, „XML and relational database management systems: inside Microsoft SQL Server 2005", in Proc. of the ACM SIG-MOD Int. Conf. on Manag.t of Data, Baltimore, MD, USA, 2005, pp. 958-962 (doi: 10.1145/1066157.1066301).
  • [5] R. Marcjan and J. Wyrostek, „Processing XML documents on the basis of quasi-relational model and SQLxD language", Studia Informatica, vol. 32, no. 2A, pp. 111-120, 2011 (doi: 10.21936/si2011 v32.n2A.253).
  • [6] N. Nurseitov, M. Paulson, R. Reynolds, and C. Izurieta, „Comparison of JSON and XML data interchange formats: a case study", in Proc. of the ISCA 22nd Int. Conf. on Comp. Appl. in Indust. And Engin. CAINE 2009, San Francisco, CA, 2009, USA, 2009, vol. 9, pp. 157-162 [Online]. Available: https://www.cs.montana.edu/izurieta/pubs/IzurietaCAINE2009.pdf
  • [7] G. Wang, „Improving data transmission in web applications via the translation between XML and JSON", in Proc. 3rd Int. Conf. on Commun. and Mob. Comput., Qingdao, China, 2011, pp. 182-185 (doi: 10.1109/CMC.2011.25).
  • [8] M. Piech and R. Marcjan, „A new approach to storing dynamic data in relational databases using JSON", Computer Science, vol. 19, no. 1, 2018 (doi: 10.7494/csci.2018.19.1.2505).
  • [9] H. Dayani-Fard and I. Jurisica, „Dynamic semi-structured repository for mining software and software-related information", U.S. Patent No. 6,339,776, 2002 [Online]. Available: https://patents.google.com/patent/CA2284949A1/en.
  • [10] V. Christophides, M. Dorr, and I. Fundulaki, „A semantic network approach to semi-structured documents repositories", in Research and Advanced Technology for Digital Libraries, First European Conference, ECDL'97 Pisa, Italy, September 1-3, 1997 Proceedings, C. Peters and C. Thanos, Eds. LNCS, vol. 1324, pp. 305-324. Berlin, Heidelberg: Springer, 1997 (doi: 10.1007/BFb0026735).
  • [11] D. Tahara, T. Diamond, and D. J. Abadi, „Sinew: a SQL system for multistructured data", in Proc. of the ACM SIGMOD Int. Conf. on Manag. of Data, Snowbird, UT, USA, 2014, pp. 815-826 (doi: 10.1145/2588555.2612183).
  • [12] M. Smaïl-Tabbone, S. Osman, N. Messai, A. Napoli, and M. D. Devignes, "BioRegistry: A structured metadata repository for bioinformatic databases", in Computational Life Sciences First International Symposium, CompLife 2005, Konstanz, Germany, September 25-27, 2005. Proceedings, R. Berthold et al., Eds. LNCS, vol. 3695, pp. 46-56. Berlin, Heidelberg: Springer, 2005 (doi: 10.1007/11560500 5).
  • [13] D. Tsirogiannis et al., „Scalable analysis platform for semistructured data", U.S. Patent No. 9,613,068, 2017 [Online]. Available: https://patents.google.com/patent/US9613068B2/en
  • [14] D. Florescu, „Managing semi-structured data", Queue, vol. 3, no. 8, pp. 18-24 2005 (doi: 10.1145/1103822.1103832).
  • [15] R. Agrawal et al., „System and method for organizing repositories of semistructured documents such as email", U.S. Patent No. 6,592,627, 2003 [Online]. Available: https://patents.google.com/patent/US6592627B1/en
  • [16] D. L. Draper, D. B. Christianson, and K. L. Komissarchik, „Method and apparatus for storing semi-structured data in a structured manner", U.S. Patent No, 6,581,062, 2003 [Online]. Available: https://patents.google.com/patent/US20060265410
  • [17] J. Komissarchik and E. Komissarchik, „System and method for facts extraction and domain knowledge repository creation from unstructured and semi-structured documents", U.S. Patent No. 8,682,674, 2014 [Online]. Available: https://patents.google.com/patent/US7756807
  • [18] F. S. Tseng and W. J. Hwung, „An automatic load/extract scheme for XML documents through object-relational repositories", J. of Syst. and Softw., vol. 64, no. 3, pp. 207-218, 2002 (doi: 10.1016/S0164-1212(02)00044-4).
  • [19] C. C. Huang and C. M. Kuo, „The transformation and search of semistructured knowledge in organizations", J. of Knowl. Manag., vol. 7, no. 4, pp. 106-123, 2003 (doi: 10.1108/13673270310492985).
  • [20] G. Dobbie, X.Wu, T.W. Ling, and M. L. Lee, „ORA-SS: An Object-Relationship-Attribute Model for Semi-Structured Data", Tech. Rep., School of Computing, Singapore, 2000 [Online]. Available: https://pdfs.semanticscholar.org/9371/c2ae3e59e2c8b107b39525318ca3ce36c90d.pdf
  • [21] R. Rajugan, T. S. Dillon, E. Chang, and L. Feng, „A layered view model for XML repositories and XML data warehouses", in Proc. Of the 5th Int. Conf. on Comp. and Inform. Technol. CIT'05, Shanghai, China, 2005, pp. 206-215 (doi: 10.1109/CIT.2005.15).
  • [22] L. Liu, C. Pu, W. Han, D. Buttler, and W. Tang, „Building an extensible wrapper repository system: A metadata approach", in Proc. of the 3d IEEE Comp. Soc. Metadata Conf., Bethesda, MD, USA, 1999.
  • [23] M. Mani and N. Sundaresan, „System and method for query processing and optimization for XML repositories", U.S. Patent No. 6,654,734, 2003 [Online]. Available: https://patents.google.com/patent/US6654734B1/en
  • [24] S. E. Madnick and M. D. Siegel, „Query and retrieving semistructured data from heterogeneous sources by translating structured queries", U.S. Patent No. 6,282,537, 2001 [Online]. Available: https://patents.google.com/patent/US6282537B1/en
  • [25] D. Skoutas and A. Simitsis, „Ontology-based conceptual design of ETL processes for both structured and semi-structured data", Int. J. on Semantic Web and Inform. Syst. (IJSWIS), vol. 3, no. 4, pp. 1-24. 2007 (doi: 10.4018/jswis.2007100101).
  • [26] Metasonic Flow, „User Manual V5.3.5", Metasonic AG Pfaffenhofen [Online]. Available: https://www.metasonic.de/
  • [27] Doxis4 iECM, „Doxis4 Architecture", Ser Solutions [Online]. Available: https://www.sergroup.com/en/technology.html
  • [28] M. Piech et al., „Model for dynamic and hierarchical data repository in relational database", Computer Science, vol. 19, no. 4, 2018 (doi: 10.7494/csci.2018.19.4.3088).
  • [29] J. Han, E. Haihong, G. Le, and J. Du, „Survey on NoSQL database", in Proc. 6th Int. Conf. on Pervasive Comput. and Appl., Port Elizabeth, South Africa, 2011, pp. 363-366 (doi: 10.1109/ICPCA.2011.6106531).
  • [30] P. P. S. Chen, „The entity-relationship model - toward a unied view of data", ACM Trans. on Database Syst. (TODS), vol. 1, no. 1, pp. 9-36 1976 (doi: 10.1145/320434.320440).
  • [31] K. Y. Whang, B. K. Park, W. S. Han, and Y. K. Lee, „Inverted index storage structure using subindexes and large objects for tight coupling of information retrieval with database management systems", U.S. Patent No. 6,349,308, 2002 [Online]. Available: https://patents.google.com/patent/US6349308B1/en
  • [32] Z. H. Liu, B. Hammerschmidt, D. McMahon, Y. Liu, and H. J. Chang, „Closing the functional and performance gap between SQL and NoSQL", in Proc. of the Int. Conf. on Manag. of Data, San Francisco, CA, USA, 2016, pp. 227-238 (doi: 10.1145/2882903.2903731).
  • [33] G. L. S. T. J. Whittaker, „Improving performance of schemaless document storage in PostgreSQL using BSON", CPSC 438 Final Project, April 29, 2013, New Haven, CT [Online]. Available: https://www.geoffreylitt.com/resources/Postgres-BSON.pdf
  • [34] M. Fowler, „CQRS", Martin Fowler's Blog, 2011 [Online]. Available: https://martinfowler.com/bliki/CQRS.html
  • [35] DB-Engines Ranking of Search Engines [Online]. Available: https://db-engines.com/en/ranking/search+engine (accessed on 2019-09-01)
  • [36] N. H. Lim, „PostgreSQL [9.5.0] vs MariaDB [10.1.11] vs MySQL [5.7.0]", 2016 [Online]. Available: http://nghenglim.github.io/PostgreSQL-9.5.0-vs-MariaDB-10.1.11-vs-MySQL-5.7.0-year-2016 (accessed on 2019-09-01)
  • [37] J. Dajda, R. Dębski, M. Kisiel-Dorohinicki and K. Piętak, „Multidomain data integration for criminal intelligence", in Man-Machine Interactions 3, A. Gruca, T. Czachórski, and S. Kozielski, Eds. Advances in Intelligent Systems and Computing series (AISC), vol. 242, p. 345-352. Springer, 2014 (DOI: 10.1007/978-3-319-02309-0 37).
  • [38] M. R. Durose, A. D. Cooper, and H. N. Snyder, „Collecting and Processing Multistate Criminal-history Data for Statistical Analysis", US Department of Justice, Oce of Justice Programs, Bureau of Justice Statistics, 2019 [Online]. Available: https://www.bjs.gov/content/pub/pdf/cpmchdsa.pdf
  • [39] A. J. Singer et al., „Victimization, fear of crime, and trust in criminal justice institutions: A cross-national analysis", Crime & Delinquency, vol. 65, no. 6, pp. 82-844, 2019 (doi: 10.1177/0011128718787513).
  • [40] R. S. Chen et al., „Exploring performance issues for a clinical database organized using an entity-attribute-value representation", J. of the Amer. Med. Inform. Assoc., vol. 7, no. 5, pp. 475-487, 2000 (doi: 10.1136/jamia.2000.0070475).
  • [41] P. M. Nadkarni et al., „Organization of heterogeneous scientic data using the EAV/CR representation", J. of the Amer. Med. Inform. Assoc., vol. 6, no. 6, pp. 478-493, 1999 (doi: 10.1136/jamia.1999.0060478).
  • [42] O. J. Reichman, M. B. Jones, and M. P. Schildhauer, „Challenges and opportunities of open data in ecology", Science, vol. 331, no. 6018, pp. 703-705, 2011 (doi: 10.1126/science.1197962).
  • [43] M. Wiener, F. T. Sommer, Z. G. Ives, R. A. Poldrack, and B. Litt, „Enabling an open data ecosystem for the neurosciences", Neuron, vol. 92, no. 4, pp. 617-621 2016 (doi: 10.1016/j.neuron.2016.11.009).
  • [44] V. Tiwari and R. S. Thakur, „An extended views based big data model toward facilitating electronic health record analytics", in Telemedicine Technologies, H. D. Jude and V. E. Balas, Eds. Academic Press, 2019, pp. 193-199 (doi: 10.1016/B978-0-12-816948-3.00013-1).
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-2b4aeae1-58f6-4f49-9a85-25e7eccbe7e4
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.