PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Czasopismo
2008 | Vol. 34, no 4 | 7-17
Tytuł artykułu

Morphological prediction for Polish by a statistical a tergo index

Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
We present a direct method to construct a morpho-syntactic guesser for Polish. Such a guesser produces morpho-syntactic descriptions for word forms unknown to the morphological analyser. The method relies upon a statistical a tergo index, in which pseudosuffixes (endings) extracted from a statistical tree define morpho-syntactic properties of corresponding word forms. The secondary aim is to investigate to what extent it is possible to develop the morphological analysis exclusively on the basis of endings. A statistically extracted a tergo index of Polish word forms was created. Various experiments giving insights into the properties of the index are presented. The method seems to be easily applicable to any other inflectional language with only minor technical changes.
Wydawca

Czasopismo
Rocznik
Strony
7-17
Opis fizyczny
Bibliogr. 24 poz., wykr.
Twórcy
autor
Bibliografia
  • [1] Wolinski M., Morfeusz - a practical tool for the morphological analysis of Polish, pp. 511-520.
  • [2] Piasecki M., Godlewski G., Pejcz J., Corpus of medical texts and tools, [in:] Proceedings of Medical Informatics and Technologies 2006, Silesian University of Technology, 2006, pp. 281-286.
  • [3] Godlewski G., Piasecki M., Sas J., Application of syntactic properties to three-level recognition of Polish hand-written medical texts, [in:] D.F. Brailsford (ed.), Proc. 2006 ACM Symposium on Document Engineering, ACM, 2006, pp. 115-121.
  • [4] Bień J.S., Koncepcja słownikowej informacji morfologicznej i jej komputerowej weryfikacji, Wyd. UW, Warszawa, 1991.
  • [5] Brill E., Some advances in transformation-based part of speech tagging, [in:] Proceedings of the Twelfth National Conference on Artificial Intelligence (AAAAI-94), 1994.
  • [6] Mikheev A., Automatic rule induction for unknown-word guessing, Computational Linguistics, 23(3), 1997, pp. 405-423.
  • [7] Schone P., Jurafsky D., Knowlege-free induction of inflectional morphologies, [in:] Proceedings of the North American Chapter of the Association for Computational Linguistics (NAACL-2001), 2001.
  • [8] Novak A., Nagy V., Oravecz C., Combining symbolic and statistical methods in morphological analysis and unknown word guessing, [in:] Proceedings of LREC’04, 2004.
  • [9] Kazakov D., Achievements and prospects of learning word morphology with inductive logic programming, [in:] Learning Language in Logic, LNCS, Springer, 2004, pp. 89-111.
  • [10] Daciuk J., Treatment of unknown words, [in:] Proceedings of Workshop on Implementing Automata WIA’99, Potsdam, Germany, 1999, Vol. 2214 of LNCS, Springer Verlag, 2001, pp. 71-80.
  • [11] Szafran K., SAM-96 - the morphological analyser for Polish, [in:] A. Narin’yani (ed.), Proceedings of International Workshop DIALOGUE’97 Computational Linguistics and its Applications, Yasnaya Polyana, Russia, June 10-15, 1997, Moscow, 1997, pp. 304-308.
  • [12] Tokarski J., Schematyczny indeks a tergo polskich form wyrazowych, Warszawa, 2002.
  • [13] Rudolf M., Metody automatycznej analizy korpusu tekstów polskich, Uniwersytet Warszawski, Wydz. Polonistyki, 2004.
  • [14] Džeroski S., Erjavec T., Learning to lemmatise Slovene words, [in:] J. Cussens, S. Džeroski (eds.), Proceedings of LLL’99, Vol. 1925 of LNAI., Springer, 2000, 69-88.
  • [15] Pavlovič-Lažetić G., Vitas D., Krstev C., Towards full lexical recognition, Proc. 7th Int. Conf., TSD 2004, Brno, Czech Republic, September 8-11, 2004, Volume 3206 of LNCS, 2004, pp. 179-186.
  • [16] Hlaváčová J., Morphological guesser of Czech words, [in:] Proceedings of Text, Speech, and Dialogue, 2001, Vol. 2166 of LNAI, Springer, 2001, pp. 70-75.
  • [17] Przepiórkowski A., The IPI PAN Corpus: Preliminary version, Institute of Computer Science PAS, 2004.
  • [18] Godlewski G., Piasecki M., Pejcz J., Corpus of medical texts and tools, Silesian University of Technology, 2006, pp. 273-280.
  • [19] Weiss D., Korpus Rzeczpospolitej. http://www.cs.put.poznan.pl/dweiss/rzeczpospolita/
  • [20] Stowarzyszenie Wikimedia Polska: Wikipedia - the free encyclopedia, http: //pl.wikipedia.org/ (2008)
  • [21] Manning C.D., Schütze H., Foundations of Statistical Natural Language Processing, The MIT Press, 2001.
  • [22] Piasecki M., Polish tagger TaKIPI: Rule based construction and optimisation, Task Quarterly, 11(1-2), 2007, pp. 151-167.
  • [23] Sojka P., Kopecek I., Pala K. (eds.), Proceedings of the Text, Speech and Dialog 2006 Conference, Lecture Notes in Artificial Intelligence, Springer, 2006.
  • [24] Kłopotek M.A., Wierzchoń ST., Trojanowski K. (eds.), Intelligent Information Processing and Web Mining, Proceedings of the International IIS: IIPWM’06 Conference held in Wisła, Poland, June 2006, Advances in Soft Computing, Springer, Berlin, 2006.
Typ dokumentu
Bibliografia
Identyfikatory
Identyfikator YADDA
bwmeta1.element.baztech-article-BAT5-0042-0011
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.