Interactive Method for Semantic Document Indexing Based on Explicit Semantic Analysis

Świeboda, W.; Krasuski, A.; Nguyen, H. S.; Janusz, A.

doi:10.3233/FI-2014-1052

Artykuł - szczegóły

Tytuł artykułu

Interactive Method for Semantic Document Indexing Based on Explicit Semantic Analysis

Autorzy

Świeboda W. , Krasuski A. , Nguyen H. S. , Janusz A.

Wybrane pełne teksty z tego czasopisma

https://fi.episciences.org/

Identyfikatory

DOI

10.3233/FI-2014-1052

Warianty tytułu

Języki publikacji

Abstrakty

In this article we propose a general framework incorporating semantic indexing and search of texts within scientific document repositories. In our approach, a semantic interpreter, which can be seen as a tool for automatic tagging of textual data, is interactively updated based on feedback from the users, in order to improve quality of the tags that it produces. In our experiments, we index our document corpus using the Explicit Semantic Analysis (ESA) method. In this algorithm, an external knowledge base is used to measure relatedness between words and concepts, and those assessments are utilized to assign meaningful concepts to given texts. In the paper, we explain how the weights expressing relations between particular words and concepts can be improved by interaction with users or by employment of expert knowledge. We also present some results of experiments on a document corpus acquired from the PubMed Central repository to show feasibility of our approach.

Słowa kluczowe

semantic search interactive learning explicit semantic analysis PubMed MeSH

Wydawca

IOS Press

Czasopismo

Fundamenta Informaticae

Rocznik

2014

Tom

Vol. 132, nr 3

Strony

423--438

Opis fizyczny

Bibliogr. 22 poz., rys., tab., wykr.

Twórcy

autor

Świeboda W.

Faculty of Mathematics, Informatics and Mechanics, The University of Warsaw, Banacha 2, 02-097, Warsaw, Poland

autor

Krasuski A.

Chair of Computer Science, The Main School of Fire Service, Słowackiego 52/54, 01-629 Warsaw, Poland

autor

Nguyen H. S.

son@mimuw.edu.pl

Faculty of Mathematics, Informatics and Mechanics, The University of Warsaw, Banacha 2, 02-097, Warsaw, Poland

autor

Janusz A.

A.Janusz@mimuw.edu.pl

Faculty of Mathematics, Informatics and Mechanics, The University of Warsaw, Banacha 2, 02-097, Warsaw, Poland

Bibliografia

[1] Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., Harshman, R.: Indexing by latent semantic analysis, Journal of the American society for information science, 41(6), 1990, 391–407.
[2] Fazzinga, B., Gianforme, G., Gottlob, G., Lukasiewicz, T.: Semantic Web search based on ontological conjunctive queries, Web Semantics: Science, Services and Agents on the World Wide Web, 2011.
[3] Feldman, R., Sanger, J., Eds.: The Text Mining Handbook, Cambridge University Press, 2007, ISBN 978-0-521-83657-9.
[4] Gabrilovich, E., Markovitch, S.: Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis, Proc. of The 20th Int. Joint Conf. on Artificial Intelligence, Hyderabad, India, 2007.
[5] Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E. G. M., Milios, E.: Information Retrieval by Semantic Similarity, Int. Journal on Semantic Web and Information Systems (IJSWIS). Special Issue of Multimedia Semantics, 3(3), 2006, 55–73.
[6] Janusz, A., Nguyen, H. S., Ślęzak, D., Stawicki, S., Krasuski, A.: JRS’2012 Data Mining Competition: Topical Classification of Biomedical Research Papers, in: Yao et al. [22], 422–431.
[7] Janusz, A., Ślęzak, D., Nguyen, H. S.: Unsupervised Similarity Learning from Textual Data, Fundamenta Informaticae, 2012.
[8] Janusz, A., Swieboda, W., Krasuski, A., Nguyen, H. S.: Interactive Document Indexing Method Based on Explicit Semantic Analysis, in: Yao et al. [22], 156–165.
[9] Manning, C., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. 2008, Online edition, 2007.
[10] Mitchell, T. M.: Machine Learning, McGraw Hill series in computer science, McGraw-Hill, 1997, ISBN 978-0-07-042807-2.
[11] Nguyen, L. A., Nguyen, H. S.: On Designing the SONCA System, in: Intelligent Tools for Building a Scientific Information Platform (R. Bembenik, L. Skonieczny, H. Rybiński, M. Niezgódka, Eds.), Springer-Verlag, New York, 2012, 9–36.
[12] R Development Core Team: R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2008.
[13] Reynolds, A., Richards, G., De La Iglesia, B., Rayward-Smith, V.: Clustering rules: a comparison of partitioning and hierarchical clustering algorithms, Journal of Mathematical Modelling and Algorithms, 5(4), 2006, 475–504.
[14] Rinaldi, A. M.: An ontology-driven approach for semantic information retrieval on the Web, ACM Trans. Internet Technol., 9, 2009, 1–24, ISSN 1533-5399.
[15] Roberts, R. J.: PubMed Central: The GenBank of the published literature, Proceedings of the National Academy of Sciences of the United States of America, 98(2), 2001, 381–382.
[16] Rousseeuw, P. J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis, Journal of computational and applied mathematics, 20, 1987, 53–65.
[17] Ślęzak, D., Janusz, A., Świeboda, W., Nguyen, H., Bazan, J., Skowron, A.: Semantic Analytics of PubMedContent, Information Quality in e-Health, 2011, 63–74.
[18] Świeboda, W., Meina, M., Nguyen, H. S.: Weight Learning for Document Tolerance Rough Set Model, RSKT, 2013.
[19] Szczuka, M., Janusz, A., Herba, K.: Clustering of Rough Set Related Documents with use of Knowledge from DBpedia, Proc. of the 6th Int. Conf. on Rough Sets and Knowledge Technology (RSKT), 6954, Springer,2011.
[20] United States National Library of Medicine: Introduction to MeSH - 2011, http://www.nlm.nih.gov/mesh/introduction.html, 2011.
[21] Wild, F., Stahl, C., Stermsek, G., Neumann, G.: Parameters driving effectiveness of automated essay scoring with LSA, 2005.
[22] Yao, J., Yang, Y., Slowinski, R., Greco, S., Li, H., Mitra, S., Polkowski, L., Eds.: Rough Sets and Current Trends in Computing - 8th International Conference, RSCTC 2012, Chengdu, China, August 17-20,2012. Proceedings, vol. 7413 of Lecture Notes in Computer Science, Springer, 2012.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-003dcbee-0021-4a53-9dc4-73e723b5f1d0