PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Polish tagger TaKIPI: rule based construction and optimization

Autorzy
Wybrane pełne teksty z tego czasopisma
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
A large number of different tags, limited corpora and the free word order are the main causes of low accuracy of tagging in Polish (automatic disambiguation of morphological descriptions) by applying commonly used techniques based on stochastic modeling. In the paper the rule-based architecture of the TaKIPI Polish tagger combining handwritten and automatically extracted rules is presented. The possibilities of optimization of its parameters and component are discussed, including the possibility of using different methods of rules extraction, than C4.5 Decision Trees applied initially. The main goal of this paper is to explore a range of promising rule-based classifiers and investigate their impact on the accuracy of tagging. Simple techniques of combing classifiers are also tested. The performed experiments have shown that even a simple combination of different classifiers can increase the tagger's accuracy by almost one percent.
Rocznik
Strony
151--167
Opis fizyczny
Bibliogr. 24 poz., tab.
Twórcy
autor
  • Institute of Applied Informatics, Wroclaw University of Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wroclaw, Poland, maciej.piasecki@pwr.wroc.pl
Bibliografia
  • [1] Przepiórkowski A 2004 The JPJ PAN Corpus. Preliminary Version, Institute of Computer Science PAS
  • [2] Manning Ch. D and Schütze H 1999 Foundations of Statistical Natural Language Processing, The MIT Press
  • [3] Przepiórkowski A 2006 The Potential of the IPI PAN Corpus. Poznań Studies in Contemporary Linguistics 41 31
  • [4] Dębowski L 2004 Proc. Int. Conf. on Intelligent Information Processing and Web Mining, Zakopane, Poland (Kłopotek M A, Wierzchoń S T and Trojanowski K, Eds), Springer Verlag, pp. 409-413
  • [5] Voutilainen A 1997 EngCG tagger, Version 2 (Bronsted T and Lytje I, Eds), Sprog og Multimedier, Aalborg Universitetsforlag
  • [6] Oliva K 2003 Contributions 4th Eur. Conf. on Formal Description of Slavic Languages (Kosta P et al., Eds), Peter Lang, pp. 299-314
  • [7] Dębowski L 2001 Internal Report IPI PAN, Instytut Podstaw Informatyki PAN, 934 www.ipipan.waw.pl/staff/l.debowski/raporty/kropka934.pdf
  • [8] Mitchell T M 1997 Machine Learning, WCBjMcGraw-Hill
  • [9] Godlewski G and Piasecki M 2006 Proc. Artificial Intelligence Studies (Kłopotek M and Tchórzewski J, Eds), Publishing House of University of Podlasie, pp. 157-164
  • [10] Piasecki M 2006 Text, Speech Dialogue. Proc. 9th Int. Conf., Brno, Czech Republic (Sojka P, Kopeček I and Pala K, Eds), Springer Verlag, LNAI4188, pp. 205-212
  • [11] Piasecki M and Godlewski G 2006 Text, Speech Dialogue. Proc. 9th Int. Conf., Brno, Czech Republic (Sojka P, Kopeček I and Pala K, Eds), Springer Verlag, LNAI4188, pp. 213-220
  • [12] Piasecki M and Wardyński A 2006 Proc. 1st Int. Symposium Advances in Artificial Intelligence and Applications, Wisła, Poland, pp. 169-178
  • [13] Hajič J, Krbec P, Kvĕtoň. P, Oliva K, Petkevič V 2001 Proc. 39th Annual Meeting of ACL, Morgan Kaufmann Publishers, pp. 260-267
  • [14] Quinlan J R 1993 C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo
  • [15] Marquez L 1999 Part-of-speech Tagging: A Machine Learning Approach based on Decision Trees, PhD Thesis, Universitat Politecnica de Catalunya
  • [16] Woliński M 2006 Proc. Int. Conf. Intelligent Information Processing and Web Mining, Ustroń, Poland (Kłopotek M A, Wierzchoń S T and Trojanowski K, Eds), Springer Verlag, pp. 511-520
  • [17] Cohen W W 1995 Machine Learning: Proc. 12th Int. Conf., Lake Tahoe, California, Morgan Kaufmann, pp. 115-123
  • [18] Frank E and Witten I H 1998 Proc. 15th Int. Conf. on Machine Learning, Morgan Kaufmann, pp. 144-151
  • [19] Landwehr N, Hall M and Frank E 2005 Machine Learning 59 (1-2) 161
  • [20] Witten I H and Frank E 2005 Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition, Morgan Kaufmann, San Francisco
  • [21] Konchady M 2006 Text Mining Application Programming, Charles River Media
  • [22] Kuncheva L 2004 Combining Pattern Classifiers: Methods and Algorithms, John Wiley& Sons, New Jersey
  • [23] Marquez L, Rodriguez H, Carmona J and Montolio J 1999 Proc. 1999 Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora, Maryland, USA, pp. 53-62
  • [24] Piasecki M 2006 Proc. Multimedia and Network Information Systems (Zgrzywa A, Ed.), Wyd. PWr., pp. 99-107
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BPG4-0035-0057
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.