PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Unsupervised Similarity Learning from Textual Data

Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
This paper presents a research on the construction of a new unsupervised model for learning a semantic similarity measure from text corpora. Two main components of the model are a semantic interpreter of texts and a similarity function whose properties are derived from data. The first one associates particular documents with concepts defined in a knowledge base corresponding to the topics covered by the corpus. It shifts the representation of a meaning of the texts from words that can be ambiguous to concepts with predefined semantics. With this new representation, the similarity function is derived from data using a modification of the dynamic rule-based similarity model, which is adjusted to the unsupervised case. The adjustment is based on a novel notion of an information bireduct having its origin in the theory of rough sets. This extension of classical information reducts is used in order to find diverse sets of reference documents described by diverse sets of reference concepts that determine different aspects of the similarity. The paper explains a general idea of the approach and also gives some implementation guidelines. Additionally, results of some preliminary experiments are presented in order to demonstrate usefulness of the proposed model.
Rocznik
Strony
319--336
Opis fizyczny
Bibliogr. 28 poz., wykr.
Twórcy
autor
autor
autor
  • Faculty of Mathematics, Informatics and Mechanics University of Warsaw Banacha 2, 02-097 Warszawa, Poland, andrzejanusz@gmail.com
Bibliografia
  • [1] Böhm, C., Faloutsos, C., Plant, C.: Outlier-robust clustering using independent components, SIGMOD Conference, 2008.
  • [2] Deerwester, S., Dumais, S. T., Furnas, G. W., Landauer, T. K., Harshman, R.: Indexing by latent semantic analysis, Journal of the American Society for Information Science, 41(6), 1990, 391-407.
  • [3] Feldman, R., Sanger, J., Eds.: The Text Mining Handbook, Cambridge University Press, 2007, ISBN 978-0-521-83657-9.
  • [4] Gabrilovich, E., Markovitch, S.: Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis, Proceedings of the 20th International Joint Conference on Artificial Intelligence, 2007.
  • [5] Ganter, B., Stumme, G., Wille, R., Eds.: Formal Concept Analysis, Foundations and Applications, vol. 3626 of Lecture Notes in Computer Science, Springer, 2005, ISBN 3-540-27891-5.
  • [6] Goldstone, R., Medin, D., Gentner, D.: Relational Similarity and the Nonindependence of Features in Similarity Judgments, Cognitive Psychology, 23, 1991, 222-262.
  • [7] Hliaoutakis, A., Varelas, G., Voutsakis, E., Petrakis, E. G. M., Milios, E.: Information Retrieval by Semantic Similarity, Int. Journal on Semantic Web and Information Systems (IJSWIS). Special Issue of Multimedia Semantics, 3(3), 2006, 55-73.
  • [8] Ho, T. B., Nguyen, N. B.: Nonhierarchical document clustering based on a tolerance rough set model, International Journal of Intelligent Systems, 17, 2002, 199-212.
  • [9] Janusz, A.: Dynamic Rule-Based Similarity Model for DNA Microarray Data, LNCS Transactions on Rough Sets, 2012, In print 2012.
  • [10] Janusz, A., Stawicki, S.: Applications of Approximate Reducts to the Feature Selection Problem, Proc. Of Int. Conf. on Rough Sets and Knowledge Technology (RSKT), 6954, Springer Berlin/Heidelberg, 2011.
  • [11] Kaufman, L., Rousseeuw, P.: Finding Groups in Data: An Introduction to Cluster Analysis, Wiley Interscience, New York, 1990.
  • [12] Ngo, C. L., Nguyen, H. S.: A Tolerance Rough Set Approach to Clustering Web Search Results, in: Knowledge Discovery in Databases: PKDD 2004 (J.-F. Boulicaut, F. Esposito, F. Giannotti, D. Pedreschi, Eds.), vol. 3202 of Lecture Notes in Computer Science, Springer Berlin / Heidelberg, 2004, 515-517.
  • [13] Pawlak, Z.: Information systems, theoretical foundations, Information Systems, 3(6), 1981, 205-218.
  • [14] Pawlak, Z.: Rough sets, rough relations and rough functions, Fundamenta Informaticae, 27(2-3), 1996, 103-108, ISSN 0169-2968.
  • [15] R Development Core Team: R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2008.
  • [16] Rinaldi, A. M.: An ontology-driven approach for semantic information retrieval on the Web, ACM Trans. Internet Technol., 9, July 2009, 10:1-10:24, ISSN 1533-5399.
  • [17] Roberts, R. J.: PubMed Central: The GenBank of the published literature, Proceedings of the National Academy of Sciences of the United States of America, 98(2), January 2001, 381-382.
  • [18] Skowron, A., Rauszer, C.: The Discernibility Matrices and Functions in Information Systems, in: Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory (R. Słowiński, Ed.), Kluwer Academic Publishers, Dordrecht, Netherlands, 1992, 331-362.
  • [19] Skowron, A., Stepaniuk, J.: Tolerance Approximation Spaces, Fundamenta Informaticae, 27(2-3), 1996, 245-253.
  • [20] Skowron, A., Stepaniuk, J., Peters, J. F., ´ Swiniarski, R. W.: Calculi of Approximation Spaces, Fundamenta Informaticae, 72(1-3), 2006, 363-378.
  • [21] Ślęzak, D.: Approximate Entropy Reducts, Fundamenta Informaticae, 53(3-4), 2002, 365-390.
  • [22] Ślęzak, D., Janusz, A.: Ensembles of Bireducts: Towards Robust Classification and Simple Representation, FGIT, 7105, Springer, 2011.
  • [23] Spearman, C.: The proof and measurement of association between two things. By C. Spearman, 1904., The American journal of psychology, 100(3-4), 1987, 441-471, ISSN 0002-9556.
  • [24] Stahl, A., Gabel, T.: Using Evolution Programs to Learn Local Similarity Measures, In Proceedings of the Fifth International Conference on Case-Based Reasoning, Springer, 2003.
  • [25] Szczuka, M., Janusz, A., Herba, K.: Clustering of Rough Set Related Documents with use of Knowledge from DBpedia, Proc. of the 6th Int. Conf. on Rough Sets and Knowledge Technology (RSKT), 6954, Springer, 2011.
  • [26] Tversky, A.: Features of similarity, Psychological Review, 84, 1977, 327-352.
  • [27] United States National Library of Medicine: Introduction to MeSH - 2011, http://www.nlm.nih.gov/mesh/introduction.html, 2011.
  • [28] Xiong, H., Chen, X.-w.: Kernel-based distance metric learning for microarray data classification, BMC Bioinformatics, 7(1), 2006, 299, ISSN 1471-2105.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BUS8-0029-0008
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.