Narzędzia help

Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
first previous next last
cannonical link button


Control and Cybernetics

Tytuł artykułu

Statistical proper name recognition in Polish economic texts

Autorzy Marcińczuk, M.  Piasecki, M. 
Treść / Zawartość
Warianty tytułu
Języki publikacji EN
EN In the paper we present a Proper Name Recognition algorithm based on the Hidden Markov Model (HMM). Recognition of the Proper Names (PN) is treated as the basis for Named Entity Recognition problem in general. The proposed method is based on combining domain-dependent method based on HMM with domain independent methods based on gazetteers and hand-written rules for recognition and post-processing that capture the general properties of Polish PN structure. A large gazetteer with entries described morphologically was acquired from the web. The HMM re-scoring mechanism was applied as a basis for integration of different knowledge sources in PN recognition. Results of experiments on a domain corpus of Polish stock exchange reports, used for training and testing, are presented. A cross-domain evaluation on two other corpora is also presented. Adaptability of the method was analysed by applying the trained model to two other domain corpora.
Słowa kluczowe
EN proper name recognition   named entity recognition   machine learning   hidden Markov model   rule-base approach   dictionary-base approach  
Wydawca Systems Research Institute, Polish Academy of Sciences
Czasopismo Control and Cybernetics
Rocznik 2011
Tom Vol. 40, no 2
Strony 393--418
Opis fizyczny Bibliogr. 25 poz.
autor Marcińczuk, M.
autor Piasecki, M.
  • Wrocław University of Technology, Wrocław, Poland
Abramowicz,W.,Filipowska,A.,Piskorski, J.,Wecel, K. andWieloch, K. (2006) Linguistic Suite for Polish Cadastral System. In: Proceedings of the LREC’06. ELRA, Genoa, Italy, 53-58.
Alias-i. (2008) LingPipe 3.9.0.,, (October 1, 2008).
Broda, B., Piasecki, M. and Radziszewski, A. (2008) Towards a Set of General Purpose Morphosyntactic Tools for Polish. In: Proceedings of the 16th International Conference Intelligent Information Systems. Academic Publishing House Exit, 441-450.
Carpenter, B. (2006) Character language models for Chinese word segmentation and named entity recogntion. In: Proceedings of the 5th ACL Chinese Special Interest Group (SIGHan), Sydney, Australia. ACL, 169-172.
Graliński, F., Jassem, K. and Marcińczuk, M. (2009a) An Environment for Named Entity Recognition and Translation. In: L. Màrquez and H. Somers, eds., Proceedings of the 13th Annual Conference of the European Association for Machine Translation, Barcelona, Spain. EAMT, 88-95.
Graliński, F., Jassem, K. and Marcińczuk, M. (2009b) Named Entity Recognition in Machine Anonymization. Kłopotek In: M. A., Przepiorkowski A. A., Wierzchoń T. and Trojanowski K. , eds., Recent Advances in Intelligent Information Systems. Academic Publishing House Exit, 247-260.
Katrenko, S. and Adriaans, P. (2007) Named Entity Recognition for Ukrainian: A Resource-Light Approach. In: Proceedings of the Workshop on Balto-Slavonic Natural Language Processing: Information Extraction and Enabling Technologies. ACL, Prague, Czech Republic, 88-93.
Kravalová J. and Žabokrtský, Z. (2009) Czech named entity corpus and SVM-based recognizer. In: Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration, Suntec, Singapore. ACL, 194-201.
LDC (2008) ACE (Automatic Content Extraction) English AnnotationGuidelines for Entities (Version 6.6). Technical report, Linguistic Data Consortium.
Malouf, R. (2002) Markov models for language-independent named entity recognition. In: Proceedings of the Sixth Conf. on Natural Language Learning (CoNLL-2002). ACL, 183-186.
Marcińczuk, M. (2007) Pattern Acquisition Methods for Information Extraction Systems. Master’s thesis, Blekinge Tekniska Högskola, Sweden.
Marcińczuk, M. and Piasecki, M. (2007) Pattern Extraction for Event Recognition in the Reports of Polish Stockholders. In: Proceedings of the International Multiconference on Computer Science and Information Technology, Wisła, Poland. IMCSIT, 2, 275-284.
Marcińczuk, M. and Piasecki, M. (2010a) Named Entity Recognition in the Domain of Polish Stock Exchange Reports. In: M.A. Kłopotek, M. Marciniak, A. Mykowiecka, W. Penczek and S.T. Wierzchoń, eds., Proceedings of the 18th International Conference Intelligent Information Systems. Wydawnictwo Akademii Podlaskiej, Siedlce, 127-140.
Marcińczuk, M. and Piasecki, M. (2010b) Study on Named Entity Recognition for Polish Based on Hidden Markov Models. In: P. Sojka, A. Horák, I. Kopecek and K. Pala, eds., Proceedings of Text, Speech and Dialogue: 13th International Conference, TSD 2010. LNCS 6231, 142-149.
Marrero, M., Sánchez-Cuadrado, S., Lara, J.M. and Andreadakis, G. (2009) Evaluation of Named Entity Extraction Systems. Research in Computing Science, 41:47-58.
Mykowiecka, A., Kupść, A., Marciniak, M. and Piskorski, J. (2007) Resources for Information Extraction from Polish texts. In: Proceedings of the 3rd Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics. Wydawnictwo Poznańskie, Poznań, 99-103.
Osenova, P. and Kolkovska, S. (2002) Combining the Named-entity Recognition Task and NP Chunking Strategy for Robust Pre-processing. In: Proc. of The 1st Workshop on Treebanks and Linguistic Theories, Sozopol, Bulgaria. Bulgarian Academy of Sciences, 167-182.
Piasecki, M. (2007) Polish Tagger TaKIPI: Rule Based Construction and Optimisation. Task Quarterly, 11(1-2):151-167.
Piasecki, M. and Radziszewski, A. (2007) Polish Morphological Guesser Based on a Statistical A Tergo Index. In: Proceedings of the International Multiconference on Computer Science and Information Technology - 2nd International Symposium Advances in Artificial Intelligence and Applications. IMCSIT, 247-256.
Piskorski, J. (2004a) Extraction of Polish named entities. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation, LREC 2004 (ELR, 2004). ACL, 313-316.
Piskorski, J. (2004b) Named-Entity Recognition for Polish with SProUT. In: L. Bolc, Z. Michalewicz and T. Nishida, eds., Intelligent Media Technology for Communicative Intelligence. LNCS 3490, Springer, 122-133.
Savary, A. and Piskorski, J. (2010) Lexicons and Grammars for Named Entity Annotation in the National Corpus of Polish. In: M.A. Kłopotek, M. Marciniak, A. Mykowiecka, W. Penczek and S.T. Wierchnoń, eds., Intelligent Information Systems. WydawnictwoAkademiiPodlaskiej, Siedlce, 141-154.
Urbańska, D. and Mykowiecka, A. (2005) Multi-words Named Entity Recognition in Polish texts. In: SLOVKO 2005 - Third International Seminar on Computer Treatment of Slavic and East European Languages, Bratislava, Slovakia. VEDA Vydavatel’stvo Slovenskej akademie vied, 208-215.
Woliński, M. (2006) Morfeusz-a Practical Tool for the Morphological Analysis of Polish. In: Proceedings of IIS:IIPWM’6. Springer, 503-512.
Zhou, G. and Su, J. (2002) Named Entity Recognition using an HMM-based Chunk Tagger. In: ACL ‘02: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. ACL, 473-480.
Kolekcja BazTech
Identyfikator YADDA bwmeta1.element.baztech-article-BATC-0008-0009