PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Interactive Method for Semantic Document Indexing Based on Explicit Semantic Analysis

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
In this article we propose a general framework incorporating semantic indexing and search of texts within scientific document repositories. In our approach, a semantic interpreter, which can be seen as a tool for automatic tagging of textual data, is interactively updated based on feedback from the users, in order to improve quality of the tags that it produces. In our experiments, we index our document corpus using the Explicit Semantic Analysis (ESA) method. In this algorithm, an external knowledge base is used to measure relatedness between words and concepts, and those assessments are utilized to assign meaningful concepts to given texts. In the paper, we explain how the weights expressing relations between particular words and concepts can be improved by interaction with users or by employment of expert knowledge. We also present some results of experiments on a document corpus acquired from the PubMed Central repository to show feasibility of our approach.
Wydawca
Rocznik
Strony
423--438
Opis fizyczny
Bibliogr. 22 poz., rys., tab., wykr.
Twórcy
autor
  • Faculty of Mathematics, Informatics and Mechanics, The University of Warsaw, Banacha 2, 02-097, Warsaw, Poland
autor
  • Chair of Computer Science, The Main School of Fire Service, Słowackiego 52/54, 01-629 Warsaw, Poland
autor
  • Faculty of Mathematics, Informatics and Mechanics, The University of Warsaw, Banacha 2, 02-097, Warsaw, Poland
autor
  • Faculty of Mathematics, Informatics and Mechanics, The University of Warsaw, Banacha 2, 02-097, Warsaw, Poland
Bibliografia
  • [1] Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., Harshman, R.: Indexing by latent semantic analysis, Journal of the American society for information science, 41(6), 1990, 391–407.
  • [2] Fazzinga, B., Gianforme, G., Gottlob, G., Lukasiewicz, T.: Semantic Web search based on ontological conjunctive queries, Web Semantics: Science, Services and Agents on the World Wide Web, 2011.
  • [3] Feldman, R., Sanger, J., Eds.: The Text Mining Handbook, Cambridge University Press, 2007, ISBN 978-0-521-83657-9.
  • [4] Gabrilovich, E., Markovitch, S.: Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis, Proc. of The 20th Int. Joint Conf. on Artificial Intelligence, Hyderabad, India, 2007.
  • [5] Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E. G. M., Milios, E.: Information Retrieval by Semantic Similarity, Int. Journal on Semantic Web and Information Systems (IJSWIS). Special Issue of Multimedia Semantics, 3(3), 2006, 55–73.
  • [6] Janusz, A., Nguyen, H. S., Ślęzak, D., Stawicki, S., Krasuski, A.: JRS’2012 Data Mining Competition: Topical Classification of Biomedical Research Papers, in: Yao et al. [22], 422–431.
  • [7] Janusz, A., Ślęzak, D., Nguyen, H. S.: Unsupervised Similarity Learning from Textual Data, Fundamenta Informaticae, 2012.
  • [8] Janusz, A., Swieboda, W., Krasuski, A., Nguyen, H. S.: Interactive Document Indexing Method Based on Explicit Semantic Analysis, in: Yao et al. [22], 156–165.
  • [9] Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. 2008, Online edition, 2007.
  • [10] Mitchell, T. M.: Machine Learning, McGraw Hill series in computer science, McGraw-Hill, 1997, ISBN 978-0-07-042807-2.
  • [11] Nguyen, L. A., Nguyen, H. S.: On Designing the SONCA System, in: Intelligent Tools for Building a Scientific Information Platform (R. Bembenik, L. Skonieczny, H. Rybiński, M. Niezgódka, Eds.), Springer-Verlag, New York, 2012, 9–36.
  • [12] R Development Core Team: R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2008.
  • [13] Reynolds, A., Richards, G., De La Iglesia, B., Rayward-Smith, V.: Clustering rules: a comparison of partitioning and hierarchical clustering algorithms, Journal of Mathematical Modelling and Algorithms, 5(4), 2006, 475–504.
  • [14] Rinaldi, A. M.: An ontology-driven approach for semantic information retrieval on the Web, ACM Trans. Internet Technol., 9, 2009, 1–24, ISSN 1533-5399.
  • [15] Roberts, R. J.: PubMed Central: The GenBank of the published literature, Proceedings of the National Academy of Sciences of the United States of America, 98(2), 2001, 381–382.
  • [16] Rousseeuw, P. J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, Journal of computational and applied mathematics, 20, 1987, 53–65.
  • [17] Ślęzak, D., Janusz, A., Świeboda, W., Nguyen, H., Bazan, J., Skowron, A.: Semantic Analytics of PubMedContent, Information Quality in e-Health, 2011, 63–74.
  • [18] Świeboda, W., Meina, M., Nguyen, H. S.: Weight Learning for Document Tolerance Rough Set Model, RSKT, 2013.
  • [19] Szczuka, M., Janusz, A., Herba, K.: Clustering of Rough Set Related Documents with use of Knowledge from DBpedia, Proc. of the 6th Int. Conf. on Rough Sets and Knowledge Technology (RSKT), 6954, Springer,2011.
  • [20] United States National Library of Medicine: Introduction to MeSH - 2011, http://www.nlm.nih.gov/mesh/introduction.html, 2011.
  • [21] Wild, F., Stahl, C., Stermsek, G., Neumann, G.: Parameters driving effectiveness of automated essay scoring with LSA, 2005.
  • [22] Yao, J., Yang, Y., Slowinski, R., Greco, S., Li, H., Mitra, S., Polkowski, L., Eds.: Rough Sets and Current Trends in Computing - 8th International Conference, RSCTC 2012, Chengdu, China, August 17-20,2012. Proceedings, vol. 7413 of Lecture Notes in Computer Science, Springer, 2012.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-003dcbee-0021-4a53-9dc4-73e723b5f1d0
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.