Knowledge Detection and Discovery using Semantic Graph Embeddings on Large Knowledge Graphs generated on Text Mining Results

Dörpinghaus, Jens; Jacobs, Marc

doi:10.15439/2020F36

Artykuł - szczegóły

Tytuł artykułu

Knowledge Detection and Discovery using Semantic Graph Embeddings on Large Knowledge Graphs generated on Text Mining Results

Autorzy

Dörpinghaus Jens , Jacobs Marc

Wybrane pełne teksty z tego czasopisma

http://annals-csis.org

Identyfikatory

DOI

10.15439/2020F36

Warianty tytułu

Konferencja

Federated Conference on Computer Science and Information Systems (15 ; 06-09.09.2020 ; Sofia, Bulgaria)

Języki publikacji

Abstrakty

Knowledge graphs play a central role in big data integration, especially for connecting data from different domains. Bringing unstructured texts, e.g. from scientific literature, into a structured, comparable format is one of the key assets. Here, we use knowledge graphs in the biomedical domain working together with text mining based document data for knowledge extraction and retrieval from text and natural language structures. For example cause and effect models, can potentially facilitate clinical decision making or help to drive research towards precision medicine. However, the power of knowledge graphs critically depends on context information. Here we provide a novel semantic approach towards a context enriched biomedical knowledge graph utilizing data integration with linked data applied to language technologies and text mining. This graph concept can be used for graph embedding applied in different approaches, e.g with focus on topic detection, document clustering and knowledge discovery. We discuss algorithmic approaches to tackle these challenges and show results for several applications like search query finding and knowledge discovery. The presented remarkable approaches lead to valuable results on large knowledge graphs.

Słowa kluczowe

data mining decision making graph theory query processing text analysis

eksploracja danych podejmowanie decyzji teoria grafów przetwarzanie zapytań analiza tekstu

Wydawca

Polskie Towarzystwo Informatyczne

Czasopismo

Annals of Computer Science and Information Systems

Rocznik

2020

Tom

Vol. 21

Strony

169--178

Opis fizyczny

Bibliogr. 33 poz., il., wykr.

Twórcy

autor

Dörpinghaus Jens

jens.doerpinghaus@dzne.de

German Center for Neurodegenerative Diseases (DZNE), Bonn, Germany
Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, Sankt Augustin, Germany

autor

Jacobs Marc

Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, Sankt Augustin, Germany

Bibliografia

1. J. Dörpinghaus and M. Jacobs, “Semantic knowledge graph embeddings for biomedical research: Data integration using linked open data,” Posters and Demo Track of the 15th International Conference on Semantic Systems. (Poster and Demo Track at SEMANTiCS 2019), no. 2451, pp. 46–50, 2019. [Online]. Available: http: //ceur-ws.org/Vol-2451/#paper-10
2. J. Dörpinghaus, J. Darms, and M. Jacobs, “What was the question? a systematization of information retrieval and nlp problems.” in 2018 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE, 2018.
3. J. Dörpinghaus, C. Düing, and V. Weil, “A minimum set-cover problem with several constraints,” in 2019 Federated Conference on Computer Science and Information Systems (FedCSIS), Sep. 2019, pp. 115–122.
4. J. Dörpinghaus, A. Stefan, B. Schultz, and M. Jacobs, “Towards context in large scale biomedical knowledge graphs,” arXiv preprint https://arxiv.org/abs/2001.08392, 2020.
5. V. Gligorijević and N. Pržulj, “Methods for biological data integration: perspectives and challenges,” Journal of the Royal Society Interface, vol. 12, no. 112, p. 20150571, 2015.
6. J. Dörpinghaus and A. Stefan, “Knowledge extraction and applications utilizing context data in knowledge graphs,” in 2019 Federated Conference on Computer Science and Information Systems (FedCSIS). IEEE, 2019, pp. 265–272.
7. J. Dörpinghaus, A. Stefan, B. Schultz, and M. Jacobs. (2020) Towards context in large scale biomedical knowledge graphs. [Online]. Available: http://arxiv.org/abs/2001.08392
8. C. Manning, P. Raghavan, and H. Schütze, Introduction to Information Retrieval. Cambridge University Press, 2008.
9. A. Clark, C. Fox, and S. Lappin, The handbook of computational linguistics and natural language processing. John Wiley & Sons, 2013.
10. H. Mirisaee, E. Gaussier, C. Lagnier, and A. Guerraz, “Terminology-based text embedding for computing document similarities on technical content,” arXiv preprint https://arxiv.org/abs/1906.01874, 2019.
11. N. Yarushkina, A. Filippov, and M. Grigoricheva, “Using of linguistic analysis of search query for improving the quality of information retrieval,” in International Conference on Information Technologies. Springer, 2019, pp. 215–226.
12. C. S. Burns, R. M. Shapiro, T. Nix, J. T. Huber et al., “Examining medline search query reproducibility and resulting variation in search results,” iConference 2019 Proceedings, 2019.
13. J. Lin and W. J. Wilbur, “Pubmed related articles: a probabilistic topic-based model for content similarity,” BMC bioinformatics, vol. 8, no. 1, p. 423, 2007.
14. D. Newman, S. Karimi, and L. Cavedon, “Using topic models to interpret medline’s medical subject headings,” in Australasian Joint Conference on Artificial Intelligence. Springer, 2009, pp. 270–279.
15. D. Trieschnigg, P. Pezik, V. Lee, F. De Jong, W. Kraaij, and D. Rebholz-Schuhmann, “Mesh up: effective mesh text classification for improved document retrieval,” Bioinformatics, vol. 25, no. 11, pp. 1412–1418, 2009.
16. Z. Lu, W. J. Wilbur, J. R. McEntyre, A. Iskhakov, and L. Szilagyi, “Finding query suggestions for pubmed,” in AMIA Annual Symposium Proceedings, vol. 2009. American Medical Informatics Association, 2009, p. 396.
17. M. Hagen, M. Michel, and B. Stein, “What was the query? generating queries for document sets with applications in cluster labeling,” in International Conference on Applications of Natural Language to Information Systems. Springer, 2015, pp. 124–133.
18. Y. Yan, X.-C. Yin, C. Yang, S. Li, and B.-W. Zhang, “Biomedical literature classification with a cnns-based hybrid learning network,” PloS one, vol. 13, no. 7, p. e0197933, 2018.
19. A. Varghese, M. Cawley, and T. Hong, “Supervised clustering for automated document classification and prioritization: a case study using toxicological abstracts,” Environment Systems and Decisions, vol. 38, no. 3, pp. 398–414, 2018.
20. D. Fensel, U. Şimşek, K. Angele, E. Huaman, E. Kärle, O. Panasiuk, I. Toma, J. Umbrich, and A. Wahler, Introduction: What Is a Knowledge Graph? Cham: Springer International Publishing, 2020, pp. 1–10. [Online]. Available: https://doi.org/10.1007/978-3-030-37439-6_1
21. L. Ehrlinger and W. Wöß, “Towards a definition of knowledge graphs.” SEMANTiCS (Posters, Demos, SuCCESS), vol. 48, 2016.
22. M. Ley, “Dblp: some lessons learned,” Proceedings of the VLDB Endowment, vol. 2, no. 2, pp. 1493–1500, 2009.
23. A. A. Salatino, F. Osborne, T. Thanapalasingam, and E. Motta, “The cso classifier: Ontology-driven detection of research topics in scholarly articles,” in International Conference on Theory and Practice of Digital Libraries. Springer, 2019, pp. 296–311.
24. B. Yates, B. Braschi, K. A. Gray, R. L. Seal, S. Tweedie, and E. A. Bruford, “Genenames.org: the HGNC and VGNC resources in 2017,” Nucleic Acids Research, vol. 45, no. D1, pp. D619–D625, 10 2016. [Online]. Available: https://doi.org/10.1093/nar/gkw1033
25. M. Ashburner, C. A. Ball, J. A. Blake, D. Botstein, H. Butler, J. M. Cherry, A. P. Davis, K. Dolinski, S. S. Dwight, J. T. Eppig et al., “Gene ontology: tool for the unification of biology,” Nature genetics, vol. 25, no. 1, pp. 25–29, 2000.
26. G. O. Consortium, “The gene ontology resource: 20 years and still going strong,” Nucleic acids research, vol. 47, no. D1, pp. D330–D338, 2019.
27. L. M. Schriml, E. Mitraka, J. Munro, B. Tauber, M. Schor, L. Nickle, V. Felix, L. Jeng, C. Bearer, R. Lichenstein et al., “Human disease ontology 2018 update: classification, content and workflow expansion,” Nucleic acids research, vol. 47, no. D1, pp. D955–D962, 2019.
28. R. Feldman and J. Sanger, The Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, 2007.
29. F. França and A. de Souza, Intelligent Text Categorization and Clustering, ser. Studies in Computational Intelligence. Springer Berlin Heidelberg, 2008.
30. J. Dörpinghaus, S. Schaaf, and M. Jacobs, “Soft document clustering using a novel graph covering approach,” BioData mining, vol. 11, no. 1, p. 11, 2018.
31. A. T. Kodamullil, E. Younesi, M. Naz, S. Bagewadi, and M. Hofmann-Apitius, “Computable cause-and-effect models of healthy and alzheimer’s disease states and their mechanistic differential analysis,” Alzheimer’s & Dementia, vol. 11, no. 11, pp. 1329–1339, 2015.
32. D. S. Wishart, Y. D. Feunang, A. C. Guo, E. J. Lo, A. Marcu, J. R. Grant, T. Sajed, D. Johnson, C. Li, Z. Sayeeda et al., “Drugbank 5.0: a major update to the drugbank database for 2018,” Nucleic acids research, vol. 46, no. D1, pp. D1074–D1082, 2017.
33. M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, G. Appleton, M. Axton, A. Baak, N. Blomberg, J.-W. Boiten, L. B. da Silva Santos, P. E. Bourne et al., “The fair guiding principles for scientific data management and stewardship,” Scientific data, vol. 3, 2016.

Uwagi

1. Track 1: Artificial Intelligence

2. Technical Session: 5th International Workshop on Language Technologies and Applications

3. Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-5af3b583-c2cd-43f9-8a78-f7020f408983