PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Recognition of structured collocations in an inflective language

Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
We present a method for the structural collocation extraction for an inflective language (Polish) based on the process divided into two phases: (1) extraction and filtering of the pairs of lemmatised wordforms and (2) structural annotation of the extracted collocations with lexico-syntactic patterns. The pattern templates and parameters are specified manually but their instances are both generated and tested on the corpus automatically. The extracted collocations were evaluated by applying them as rules in morphosyntactic disambiguation of Polish and by comparing them with a list of two-word expressions extracted from two Polish dictionaries.
Czasopismo
Rocznik
Strony
27--36
Opis fizyczny
Bibliogr. 24 poz.
Twórcy
autor
autor
  • Wrocław University of Technology, Institute of Informatics, Wyb. Wyspiańskiego 27, 50-370 Wrocław, Poland, bartosz.broda@pwr.wroc.pl
Bibliografia
  • [1] Banski P., Moszczynski R., Enhancing an English-Polish Electronic Dictionary for Multiword Expression Research, Proceedings of the Sixth International Language Resources and Evaluation (LREC’08). ELRA, Marrakech, Morocco, 2008.
  • [2] Broda B., Piasecki M., Radziszewski A., Towards a Set of General Purpose Morphosyntactic Tools for Polish, Proceedings of the Intelligent Information Systems, Zakopane, Poland, June, 2008 Exit, Warszawa, 2008.
  • [3] Buczynski A., Pozyskiwanie z internetu tekstów do badań lingwistycznych, MSc thesis, Wydz. Mat., Inform, i Mech., Uniwersytet Warszawski, 2004.
  • [4] Buczynski, A., Okninski, T., Program Kolokacje, WWW: http://www.mimuw.edu.pl/polszczyzna/kolokacje/(2006)
  • [5] Derwojedowa M., Piasecki M., Szpakowicz S., Zawisławska M., plWordNet - the Polish Wordnet. WWW: plword-net.pwr.wroc.pl (2007)
  • [6] Evert S., The Statistics of Word Cooccurrences: Word Pairs and Collocations, PhD thesis, University of Stuttgart.
  • [7] Fleiss J.L., Measuring Nominal Scale Agreement among Many Raters, Psychological Bulletin, 76(5), 1971, pp. 378-382.
  • [8] Israel G., Determining Sample Size, University of Florida Tech. Rep., 1992.
  • [9] Jacquemin C., Spotting and Discovering Terms through Natural Language Processing, The MIT Press, 2001.
  • [10] Kukła P., Tager dla języka polskiego oparty na kombinacji metod statystycznych, MSc thesis, Wydz. Inf. i Zarządz., Politechnika Wrocławska, 2007.
  • [11] Manning C.D., Schütze H., Foundations of Statistical Natural Language Processing, The MIT Press, 2001.
  • [12] Moirón V.M.B., Data-driven identification affixed expressions and their modiflability, PhD thesis, Rijksuniversiteit Groningen, 2005.
  • [13] Nenadic G., Spasic I., Ananiadou S., Morpho-syntactic clues for terminological processing in Serbian, Proceedings of Workshop on Morphological Processing of Slavic Languages, EACL 2003, Budapest, Hungary, 2003.
  • [14] Pecina P., An extensive empirical study of collocation extraction methods, Proceedings of the ACL Student Research Workshop Ann Arbor, Michigan, Association for Computational Linguistics, 2005, pp. 13-18.
  • [15] Piasecki M., Hand-written and Automatically Extracted Rules for Polish Tagger, [in:] P. Sojka et. al. (eds.), Proc. of the Text, Speech and Dialog 2006 LNAI, Springer, 2006.
  • [16] Piasecki M., Godlewski G., Effective architecture of the Polish tagger, [in:] P. Sojka et. al. (eds.), Proc. of the Text, Speech and Dialog 2006 LNAI, Springer, 2006.
  • [17] Piasecki M., Broda B., Semantic Similarity Measure of Polish Nouns Based on Linguistic Features, [in:] W. Abramowicz (ed.), Business Information Systems, 10th International Conference, BIS 2007, Poznań, Poland, April 25-27, 2007, Springer, LNCS 4439, 2007.
  • [18] Piotrowski T., Saloni Z., Kieszonkowy słownik angielsko-polski i polsko-angielski, Wyd. Wilga, Warszawa, 1999.
  • [19] Przepiórkowski A., The IPI PAN Corpus Preliminary Version, Institute of Computer Science PAS, 2004.
  • [20] PWN: Słownik języka polskiego, Published on WWW: http://sjp.pwn.pl/, 2007.
  • [21] Sharoff S., What is at stake: a case study of Russian expressions starting with a preposition, [in:] T. Tanaka, A. Villavicencio, F. Bond, A. Korhonen (eds.), Second ACL Workshop on Multiword Expressions: Integrating Processing, Barcelona, Spain, Association for Computational Linguistics, 2004, pp. 17-23.
  • [22] Smadja F., Retrieving collocations from text: Xtract, Computational Linguistics, 19(1), 1993, pp. 143-177.
  • [23] Spasic I., A Machine Learning Approach to Term Classification, PhD thesis, Information Systems Research Centre School of Computing, Science and Engineering University of Salford, Salford, UK, 2004.
  • [24] Zesch T., Gurevych I., Automatically Creating Datasetsfor Measures of Semantic Relatedness, Proceedings of the Workshop on Linguistic Distances, Association for Computational Linguistics, 2006, pp. 16-24.
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BAT5-0042-0013
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.