Statistical proper name recognition in Polish economic texts

Autorzy Marcińczuk, M.  Piasecki, M. 
Treść / Zawartość
Warianty tytułu
Języki publikacji EN
EN In the paper we present a Proper Name Recognition algorithm based on the Hidden Markov Model (HMM). Recognition of the Proper Names (PN) is treated as the basis for Named Entity Recognition problem in general. The proposed method is based on combining domain-dependent method based on HMM with domain independent methods based on gazetteers and hand-written rules for recognition and post-processing that capture the general properties of Polish PN structure. A large gazetteer with entries described morphologically was acquired from the web. The HMM re-scoring mechanism was applied as a basis for integration of different knowledge sources in PN recognition. Results of experiments on a domain corpus of Polish stock exchange reports, used for training and testing, are presented. A cross-domain evaluation on two other corpora is also presented. Adaptability of the method was analysed by applying the trained model to two other domain corpora.
Słowa kluczowe
EN proper name recognition   named entity recognition   machine learning   hidden Markov model   rule-base approach   dictionary-base approach  
Wydawca Systems Research Institute, Polish Academy of Sciences
Czasopismo Control and Cybernetics
Rocznik 2011
Tom Vol. 40, no 2
Strony 393--418
Opis fizyczny Bibliogr. 25 poz.
autor Marcińczuk, M.
autor Piasecki, M.
  • Wrocław University of Technology, Wrocław, Poland
Kolekcja BazTech
Identyfikator YADDA bwmeta1.element.baztech-article-BATC-0008-0009