Adaptive information extraction from structured text documents

Ożdżyński, P.; Zakrzewska, D.

Artykuł - szczegóły

Tytuł artykułu

Adaptive information extraction from structured text documents

Autorzy

Ożdżyński P. , Zakrzewska D.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Effective analysis of structured documents may decide on management information systems performance. In the paper, an adaptive method of information extraction from structured text documents is considered. We assume that documents belong to thematic groups and that required set of information may be determined ”apriori”. The knowledge of document structure allows to indicate blocks, where certain information is more probable to appear. As the result structured data, which can be further analysed are obtained. The proposed solution uses dictionaries and flexion analysis, and may be applied to Polish texts. The presented approach can be used for information extraction from official letters, information sheets and product specifications.

Słowa kluczowe

natural language processing information extraction tagging named entity recognition

przetwarzanie języka naturalnego zdobywanie informacji tagowanie

Wydawca

Wydawnictwo Szkoły Głównej Gospodarstwa Wiejskiego w Warszawie

Czasopismo

Information Systems in Management

Rocznik

2014

Tom

Vol. 3, No. 4

Strony

261--272

Opis fizyczny

Bibliogr. 15 poz., rys., tab., wykr.

Twórcy

autor

Ożdżyński P.

Institute of Information Technology, Lodz University of Technology

autor

Zakrzewska D.

Institute of Information Technology, Lodz University of Technology

Bibliografia

[1] Kosala L., Blockeel H., Bruynooghe M., Van den Bussche J. (2006) Information Extraction from Structured Documents Using k-testable Tree Automaton Inference, Data & Knowledge Engineering 58, 129-158.
[2] Kanya N., Ravi T. (2012) Modeling and Techniques in Named Entity Recognition - An Information Extraction Task, Third International Conference on Sustainable Energy and Intelligent Systems, Tamilnadu, India, 27-29 December.
[3] Zhu Junwu, Jiang Yi, Xu Yingying (2009) Automatic Knowledge Acquire System Oriented to Web Pages, Proc. of the 3rd International Conference on Intelligent Information Technology Application, 21-22 Nov., Yangzhou University Yangzhou, China, 487-490.
[4] Cvitaš A.(2011) Relation Extraction from Text Documents, Proc. of the 34th International Convention MIPRO 2011, May 23-27, Opatija, Croatia, 1565-1570.
[5] Fang Luo, Pei Fang, Qizhi Qiu, Han Xiao (2012) Features Induction for Product Named Entity Recognition with CRFs, Proc. of the 2012 IEEE 16th International Conference on Computer Supported Cooperative Work in Design, 491-496.
[6] Xu Qiuyan, Li Fang (2011) Joint Learning of Named Entity Recognition and Relation Extraction, 2011 International Conference on Computer Science and Network Technology, 1978-1982.
[7] Cheng Ziguang, Zheng Dequan, Li Sheng (2013) Multi-Pattern Fusion Based SemiSupervised Name Entity Recognition, Proc. Of the 2013 International Conference on Machine Learning and Cybernetics, Tianjin, 14-17 July, 45-49.
[8] Zhu Jianhan (2009) An Adaptive Approach for Web Scale Named Entity Recognition, 1 st IEEE Symposium on Web Society 2009, 41-46.
[9] Todorovi B.T., Ran i S.R., Markovi I.M., Mulali E.H., Ili V.M. (2008) Named Entity Recognition and Classification using Context Hidden Markov Model, 9th Symposium on Neural Network Applications in Electrical Engineering, September 25-27.
[10] Chan Shing-Kit, Lam Wai (2007) Efficient Methods for Biomedical Named Entity Recognition, Proc. of the 7th IEEE International Conference on Bioinformatics & Bioengineering, Boston MA, October 14-17, 729-735.
[11] Liao Zhihua, Wu Hongguang (2012) Biomedical Named Entity Recognition based on Skip-Chain CRFS, 2012 International Conference on Industrial Control and Electronics Engineering, 1495-1498.
[12] Keretna S., Lim Ch. P., Creighton D. (2014) A Hybrid Model for Named Entity Recognition Using Unstructured Medical Text, Proc. of the 2014 9th International Conference on System of Systems Engineering, Adelaide Australia, June 9-13, 85-90.
[13] Debole F., Sebastiani F. (2005) An analysis of the relative hardness of reuters-21578 subsets, J. Am. Soc. Inf. Sci. Technol., 56/2005, 584-596.
[14] Sukanya M., Biruntha S. (2012) Techniques on text mining, Proc. of the IEEE Int. Conference on Advanced Communication Control and Computing Technologies, 269- 271.
[15] Ożdżyński P. (2014) Text document categorization based on word frequent sequence mining, Information Systems Architecture and Technology, Contemporary Approaches to Design and Evaluation of Information Systems, Oficyna Wydawnicza Politechniki Wrocławskiej, 129-138.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-0e58f4ae-a40d-42ed-b732-c49661f39720