Morphological prediction for Polish by a statistical a tergo index

Piasecki, M.; Radziszewski, A.

Artykuł - szczegóły

Tytuł artykułu

Morphological prediction for Polish by a statistical a tergo index

Autorzy

Piasecki M. , Radziszewski A.

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

We present a direct method to construct a morpho-syntactic guesser for Polish. Such a guesser produces morpho-syntactic descriptions for word forms unknown to the morphological analyser. The method relies upon a statistical a tergo index, in which pseudosuffixes (endings) extracted from a statistical tree define morpho-syntactic properties of corresponding word forms. The secondary aim is to investigate to what extent it is possible to develop the morphological analysis exclusively on the basis of endings. A statistically extracted a tergo index of Polish word forms was created. Various experiments giving insights into the properties of the index are presented. The method seems to be easily applicable to any other inflectional language with only minor technical changes.

Słowa kluczowe

morphological guesser Polish automatic extraction corpus linguistics statistical a tergo index

Wydawca

Oficyna Wydawnicza Politechniki Wrocławskiej

Czasopismo

Systems Science

Rocznik

2008

Tom

Vol. 34, no 4

Strony

7--17

Opis fizyczny

Bibliogr. 24 poz., wykr.

Twórcy

autor

Piasecki M.

autor

Radziszewski A.

Wroclaw University of Technology, Institute of Informatics, Wyb. Wyspiańskiego 27, 50-370 Wroclaw, Poland, adam.radziszewski@pwr.wroc.pl

Bibliografia

[1] Wolinski M., Morfeusz - a practical tool for the morphological analysis of Polish, pp. 511-520.
[2] Piasecki M., Godlewski G., Pejcz J., Corpus of medical texts and tools, [in:] Proceedings of Medical Informatics and Technologies 2006, Silesian University of Technology, 2006, pp. 281-286.
[3] Godlewski G., Piasecki M., Sas J., Application of syntactic properties to three-level recognition of Polish hand-written medical texts, [in:] D.F. Brailsford (ed.), Proc. 2006 ACM Symposium on Document Engineering, ACM, 2006, pp. 115-121.
[4] Bień J.S., Koncepcja słownikowej informacji morfologicznej i jej komputerowej weryfikacji, Wyd. UW, Warszawa, 1991.
[5] Brill E., Some advances in transformation-based part of speech tagging, [in:] Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAAI-94), 1994.
[6] Mikheev A., Automatic rule induction for unknown-word guessing, Computational Linguistics, 23(3), 1997, pp. 405-423.
[7] Schone P., Jurafsky D., Knowlege-free induction of inflectional morphologies, [in:] Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL-2001), 2001.
[8] Novak A., Nagy V., Oravecz C., Combining symbolic and statistical methods in morphological analysis and unknown word guessing, [in:] Proceedings of LREC’04, 2004.
[9] Kazakov D., Achievements and prospects of learning word morphology with inductive logic programming, [in:] Learning Language in Logic, LNCS, Springer, 2004, pp. 89-111.
[10] Daciuk J., Treatment of unknown words, [in:] Proceedings of Workshop on Implementing Automata WIA’99, Potsdam, Germany, 1999, Vol. 2214 of LNCS, Springer Verlag, 2001, pp. 71-80.
[11] Szafran K., SAM-96 - the morphological analyser for Polish, [in:] A. Narin’yani (ed.), Proceedings of International Workshop DIALOGUE’97 Computational Linguistics and its Applications, Yasnaya Polyana, Russia, June 10-15, 1997, Moscow, 1997, pp. 304-308.
[12] Tokarski J., Schematyczny indeks a tergo polskich form wyrazowych, Warszawa, 2002.
[13] Rudolf M., Metody automatycznej analizy korpusu tekstów polskich, Uniwersytet Warszawski, Wydz. Polonistyki, 2004.
[14] Džeroski S., Erjavec T., Learning to lemmatise Slovene words, [in:] J. Cussens, S. Džeroski (eds.), Proceedings of LLL’99, Vol. 1925 of LNAI., Springer, 2000, 69-88.
[15] Pavlovič-Lažetić G., Vitas D., Krstev C., Towards full lexical recognition, Proc. 7th Int. Conf., TSD 2004, Brno, Czech Republic, September 8-11, 2004, Volume 3206 of LNCS, 2004, pp. 179-186.
[16] Hlaváčová J., Morphological guesser of Czech words, [in:] Proceedings of Text, Speech, and Dialogue, 2001, Vol. 2166 of LNAI, Springer, 2001, pp. 70-75.
[17] Przepiórkowski A., The IPI PAN Corpus: Preliminary version, Institute of Computer Science PAS, 2004.
[18] Godlewski G., Piasecki M., Pejcz J., Corpus of medical texts and tools, Silesian University of Technology, 2006, pp. 273-280.
[19] Weiss D., Korpus Rzeczpospolitej. http://www.cs.put.poznan.pl/dweiss/rzeczpospolita/
[20] Stowarzyszenie Wikimedia Polska: Wikipedia - the free encyclopedia, http: //pl.wikipedia.org/ (2008)
[21] Manning C.D., Schütze H., Foundations of Statistical Natural Language Processing, The MIT Press, 2001.
[22] Piasecki M., Polish tagger TaKIPI: Rule based construction and optimisation, Task Quarterly, 11(1-2), 2007, pp. 151-167.
[23] Sojka P., Kopecek I., Pala K. (eds.), Proceedings of the Text, Speech and Dialog 2006 Conference, Lecture Notes in Artificial Intelligence, Springer, 2006.
[24] Kłopotek M.A., Wierzchoń ST., Trojanowski K. (eds.), Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM’06 Conference held in Wisła, Poland, June 2006, Advances in Soft Computing, Springer, Berlin, 2006.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BAT5-0042-0011