Polish tagger TaKIPI: rule based construction and optimization

Piasecki, M.

Artykuł - szczegóły

Tytuł artykułu

Polish tagger TaKIPI: rule based construction and optimization

Autorzy

Piasecki M.

Wybrane pełne teksty z tego czasopisma

https://journal.mostwiedzy.pl/TASKQuarterly

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

A large number of different tags, limited corpora and the free word order are the main causes of low accuracy of tagging in Polish (automatic disambiguation of morphological descriptions) by applying commonly used techniques based on stochastic modeling. In the paper the rule-based architecture of the TaKIPI Polish tagger combining handwritten and automatically extracted rules is presented. The possibilities of optimization of its parameters and component are discussed, including the possibility of using different methods of rules extraction, than C4.5 Decision Trees applied initially. The main goal of this paper is to explore a range of promising rule-based classifiers and investigate their impact on the accuracy of tagging. Simple techniques of combing classifiers are also tested. The performed experiments have shown that even a simple combination of different classifiers can increase the tagger's accuracy by almost one percent.

Słowa kluczowe

morphosyntactic tagging Polish rule based tagging decizion trees

Wydawca

Politechnika Gdańska

Czasopismo

TASK Quarterly : scientific bulletin of Academic Computer Centre in Gdansk

Rocznik

2007

Tom

Vol. 11, No 1-2

Strony

151--167

Opis fizyczny

Bibliogr. 24 poz., tab.

Twórcy

autor

Piasecki M.

Institute of Applied Informatics, Wroclaw University of Technology, Wybrzeże Wyspiańskiego 27, 50-370 Wroclaw, Poland, maciej.piasecki@pwr.wroc.pl

Bibliografia

[1] Przepiórkowski A 2004 The JPJ PAN Corpus. Preliminary Version, Institute of Computer Science PAS
[2] Manning Ch. D and Schütze H 1999 Foundations of Statistical Natural Language Processing, The MIT Press
[3] Przepiórkowski A 2006 The Potential of the IPI PAN Corpus. Poznań Studies in Contemporary Linguistics 41 31
[4] Dębowski L 2004 Proc. Int. Conf. on Intelligent Information Processing and Web Mining, Zakopane, Poland (Kłopotek M A, Wierzchoń S T and Trojanowski K, Eds), Springer Verlag, pp. 409-413
[5] Voutilainen A 1997 EngCG tagger, Version 2 (Bronsted T and Lytje I, Eds), Sprog og Multimedier, Aalborg Universitetsforlag
[6] Oliva K 2003 Contributions 4th Eur. Conf. on Formal Description of Slavic Languages (Kosta P et al., Eds), Peter Lang, pp. 299-314
[7] Dębowski L 2001 Internal Report IPI PAN, Instytut Podstaw Informatyki PAN, 934 www.ipipan.waw.pl/staff/l.debowski/raporty/kropka934.pdf
[8] Mitchell T M 1997 Machine Learning, WCBjMcGraw-Hill
[9] Godlewski G and Piasecki M 2006 Proc. Artificial Intelligence Studies (Kłopotek M and Tchórzewski J, Eds), Publishing House of University of Podlasie, pp. 157-164
[10] Piasecki M 2006 Text, Speech Dialogue. Proc. 9th Int. Conf., Brno, Czech Republic (Sojka P, Kopeček I and Pala K, Eds), Springer Verlag, LNAI4188, pp. 205-212
[11] Piasecki M and Godlewski G 2006 Text, Speech Dialogue. Proc. 9th Int. Conf., Brno, Czech Republic (Sojka P, Kopeček I and Pala K, Eds), Springer Verlag, LNAI4188, pp. 213-220
[12] Piasecki M and Wardyński A 2006 Proc. 1st Int. Symposium Advances in Artificial Intelligence and Applications, Wisła, Poland, pp. 169-178
[13] Hajič J, Krbec P, Kvĕtoň. P, Oliva K, Petkevič V 2001 Proc. 39th Annual Meeting of ACL, Morgan Kaufmann Publishers, pp. 260-267
[14] Quinlan J R 1993 C4.5: Programs for Machine Learning, Morgan Kaufmann, San Mateo
[15] Marquez L 1999 Part-of-speech Tagging: A Machine Learning Approach based on Decision Trees, PhD Thesis, Universitat Politecnica de Catalunya
[16] Woliński M 2006 Proc. Int. Conf. Intelligent Information Processing and Web Mining, Ustroń, Poland (Kłopotek M A, Wierzchoń S T and Trojanowski K, Eds), Springer Verlag, pp. 511-520
[17] Cohen W W 1995 Machine Learning: Proc. 12th Int. Conf., Lake Tahoe, California, Morgan Kaufmann, pp. 115-123
[18] Frank E and Witten I H 1998 Proc. 15th Int. Conf. on Machine Learning, Morgan Kaufmann, pp. 144-151
[19] Landwehr N, Hall M and Frank E 2005 Machine Learning 59 (1-2) 161
[20] Witten I H and Frank E 2005 Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition, Morgan Kaufmann, San Francisco
[21] Konchady M 2006 Text Mining Application Programming, Charles River Media
[22] Kuncheva L 2004 Combining Pattern Classifiers: Methods and Algorithms, John Wiley& Sons, New Jersey
[23] Marquez L, Rodriguez H, Carmona J and Montolio J 1999 Proc. 1999 Joint SIGDAT Conf. on Empirical Methods in Natural Language Processing and Very Large Corpora, Maryland, USA, pp. 53-62
[24] Piasecki M 2006 Proc. Multimedia and Network Information Systems (Zgrzywa A, Ed.), Wyd. PWr., pp. 99-107

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BPG4-0035-0057