Ontology-Based Information Extraction: Current Approaches
Treść / Zawartość
The purpose of Information Extraction (IE) is extracting information from unstructured, or semi structured machine readable documents by automatic means. Generally this means dealing with human language texts using natural language processing (NLP) techniques. Recently on the market of IE systems a new player emerged. Ontology-Based IE (OBIE) idea consequently gains more and more supporters. In this approach a crucial role in the IE process is played by ontology (formal representation of the knowledge by a set of concepts within a domain and the relationships between those concepts). Using Ontology as one of the IE tools makes OBIE very convenient approach for gathering information that can be later on used in construction of Semantic Web. In this paper I will explain the idea of OBIE with its fl aws and advantages. I will try not only to provide theoretical approach, but also to review current trends in this fi eld. This will be done to point out some common architecture in currently used systems and in the end classify them based on diff erent factors depending on their usability in real life application. As a conclusion an attempt to identify possible trends and directions in this fi eld will be made.
Bibliogr. 39 poz., rys.
-  Huettner, A.K., et al. Automatic Extraction of Facts from Press Releases to Generate News Stories. London, 1980.
-  Eikvil, L. “Information Extraction from World Wide Web — A Survey”. July 1999.
-  Riloff , E. Information extraction as a stepping stone toward story understanding. MIT Press, 1999.
-  Russell, S. and P. Norvig. Artifi cial Intelligence: A Modern Approach. 2nd Edition. Prentice-Hall, 2003.
-  Suchanek, F.M., Gjergji Kasneci and Gerhard Weikum. YAGO: A Core of Semantic Knowledge Unifying WordNet and Wikipedia. 2007.
-  Gruber, T.R. “A Translation Approach to Portable Ontology Specifi cations”. September 1992.
-  Bontcheva, K. Open-source Tools for Creation, Maintenance, and Storage of Lexical Resources for Language Generation from Ontologies. 2004.
-  Wimalasuriya, D.C. and Dejing D. “Ontology-Based Information Extraction: An Introduction and a Survey of Current Approaches”. Journal of Information Science 2010.
-  Fawzi, M. “Wikipedia 3.0: Th e End of Google?” 06.2006.
-  Shadbolt, N., W. Hall, and T. Berners-Lee. “Th e Semantic Web Revisited”. IEEE Intel. Sys., April 2007.
-  Chavez, A., and P. Maes. Kasbah: An Agent Marketplace for Buying and Selling Goods. MIT Media Lab.
-  Popov, B., et al. “KIM — a semantic platform for information extraction and retrieval”. J. of Nat. Lang. Eng. 2004.
-  Graua, B.C., et al. “OWL 2: Th e next step for OWL”. “Semantic Web Challenge” conf. paper. 2007.
-  Knublauch, H., and M.A. Musen. “Editing Description Logic Ontologies with the Protégé OWL Plugin”. Stanford Medical Informatics Review 2004.
-  Laclavik, M., M. Seleng, and M. Babik. OnTeA: Semiautomatic Ontology based Text Annotation Method. 2007.
-  Wu, S.-H., T.-H. Tsai, and W.-L. Hsu. Text Categorization Using Automatically Acquired Domain Ontology. 2003.
-  Buitelaar, P., D. Olejnik, and M. Sintek. A Protégé Plug-In for Ontology Extraction from Text Based on Linguistic Analysis. 2004.
-  “7 Search Evolutions for ‘07”. Business Wire Magazine 2007.
-  Stross, R. “Th e Human Touch Th at May Loosen Google’s Grip”. New York Times, June 2007.
-  Wu, F., and D.S. Weld. Autonomously Semantifying Wikipedia. 2007.
-  Feldman, R., and J. Sanger. Th e Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press, 2007.
-  do Prado, H.A., and E. Ferneda. Emerging Technologies of Text Mining: Techniques and Applications. 2008.
-  Riloff , E. Automatically constructing a dictionary for information extraction tasks. 1993.
-  Ciravegna, F. “(LP)2 An adaptive algorithm for information extraction from Web-related texts”. Workshop on Adaptive Text Extraction and Mining, 2001.
-  Tang, J., et al. “E-mail data cleaning”. SIGKDD’2005, 2005.
-  Boser, B.E., I.M. Guyon and V.N. Vapnik. A training algorithm for optimal margin classifi ers. ACM Press, 1992.
-  Tang, J., et al. Information Extraction: Methodologies and Applications. 2008.
-  Peng, F. and A. McCallum. “Accurate information extraction from research papers using conditional random fi elds”. HLT-NAACL, 2004.
-  Zhu, J., et al. “2D conditional random fi elds for Web information extraction”. ICML2005, 2005.
-  Sutton, C., and A. McCallum. “An introduction to conditional random fi elds for relational learning”. Statistical relational learning 2005.
-  Tang. J., et al. “A new approach to personal network search based on information extraction”. ASWC 2006.
-  Popov, B., et al. “KIM — semantic annotation platform”. Proceedings of the 2nd International Semantic Web Conference, 2003.
-  Buitelaar, P. and M. Siegel. “Ontology-based Information Extraction with SOBA”. Fifth International Conference on Language Resources and Evaluation, 2006.
-  Banko, M., et al. Open information extraction from the web. AAAI Press, 2007.
-  Maedche, A. and S. Staab, “Th e Text-To-Onto Ontology Learning Environment”. Eighth International Conference on Conceptual Structures, 2000.
-  Wu, F. and D.S. Weld. “Automatically refi ning the wikipedia infobox ontology”. 17th International Conference on World Wide Web, 2008.
-  Dung, T.Q. and W. Kameyama. Ontology-Based Information Extraction and Information Retrieval in Health Care Domain. 2007.
-  Hwang, C., E. Franconi and M. Kifer. “Incompletely and imprecisely speaking: using dynamic ontologies for representing and retrieving information”. 6th International Workshop on Knowledge Representation Meets Databases, 1999.
-  Adrian, B., et al. “iDocument: using ontologies for extracting and annotating information from unstructured text”. 32nd Annual German Conference on AI, 2009.