Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Evaluating lexicographer controlled semi-automatic word sense disambiguation method in a large scale experiment

Treść / Zawartość
Warianty tytułu
Języki publikacji
Word Sense Disambiguation in text remains a difficult problem as the best supervised methods require laborious and costly manual preparation of training data. On the other hand, the unsupervised methods yield significantly lower precision and produce results that are not satisfying for many applications. Recently, an algorithm based on weakly-supervised learning for WSD called Lexicographer-Controlled Semi-automatic Sense Disambiguation (LexCSD) was proposed. The method is based on clustering of text snippets including words in focus. For each cluster we find a core, which is labelled with a word sense by a human, and is used to produce a classifier. Classifiers, constructed for each word separately, are applied to text. The goal of this work is to evaluate LexCSD trained on large volume of untagged text. A comparison showed that the approach is better than most frequent sense baseline in most cases.
Opis fizyczny
Bibliogr. 39 poz.
  • Institute of Informatics, Wrocław University of Technology, Poland
  • Abney, S. (2008) Semisupervised Learning for Computational Linguistics. Chapman & Hall/CRC.
  • Agirre, E. and Edmonds, P., eds. (2006) Word Sense Disambiguation: Algorithms and Applications. Springer.
  • Agirre, E. and Soroa, A. (2007) Evaluating word sense induction and discrimination systems. In: Proc. of the 4th International Workshop on Semantic Evaluations (SemEval-2007), Association for Computational Linguistics, 7-12.
  • Agirre, E. and Stevenson, M. (2006) Knowledge Sources for Word Sense Disambiguation. In: Word Sense Disambiguation: Algorithms and Applications. Springer.
  • Aha, D.W., Kible, D.R. and Albert, M.K. (1991) Instance-based learning algorithms. Machine Learning, 6(1), 37-66.
  • Artstein, R. and Poesio, M. (2008) Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4), 555-596.
  • Baś, D., Broda, B. and Piasecki, M. (2008) Towards Word Sense Disambiguation of Polish. In: 3rd Int. Symp. Advances in AI and Applications. IMCSIT, 73-78.
  • Breiman, L. (2001) Random forests. Machine Learning, 45(1), 5-32.
  • Broda, B. and Mazur, W. (2009) Evaluation of Clustering Algorithms for Polish Word Sense Disambiguation. In: 5th Int. Symp. Adv. in AI and Applications. IEEE, 25-32.
  • Broda, B. and Piasecki, M. (2009) Semi-supervised Word Sense Disambiguation Based on Weakly Controlled Sense Induction. In: 4rd Int. Symp. Adv. in AI and Applications. IEEE, 17-24.
  • Broda, B., Piasecki, M. and Maziarz, M. (2010a) Evaluating LexCSD –-a Weakly-Supervised Method on Improved Semantically Annotated Corpus in a Large Scale Experiment. In: Intelligent Information Systems. Wydawnictwo Akademii Podlaskiej, Siedlce, 63-76.
  • Broda, B., Piasecki, M. and Szpakowicz, S. (2010b) Extraction of Polish Noun Senses from Large Corpora by Means of Clustering. Control and Cybernetics, 39 (2), 401-420.
  • Fellbaum, C. et al. (1998) WordNet: An electronic lexical database. MIT Press, Cambridge, MA.
  • Freund, Y. and Schapire, R.E. (1996) Experiments with a New Boosting Algorithm. In: ICML, 148-156.
  • Harris, Z.S. (1968) Mathematical Structures of Language. Interscience Publishers, New York.
  • Karypis, G. (2002) CLUTO a clustering toolkit. Tech. report, Univ. of Minnesota.
  • Kilgarriff, A. (1997) The hard parts of lexicography. International Journal of Lexicography, 11 (1), 51-54.
  • Kilgarriff, A. (2006) Word Senses. In: Word Sense Disambiguation: Algorithms and Applications. Springer.
  • Kilgarriff, A., Husák, M., McAdam, K., Rundell, M. and Rychl’y, P. (2008) GDEX: Automatically finding good dictionary examples in a corpus. In: Proceedings of EURALEX. Universitat Pompeu Fabra, 425-32.
  • Kilgarriff, A. and Koeling, R. (2003) An Evaluation of a Lexicographer’s Workbench IncorporatingWord Sense Disambiguation. In: Gelbukh A.F., ed., CICLing. LNCS 2588, Springer, 225-240.
  • Kohavi, R. (1995) The power of decision tables. Machine Learning: ECML-95, LNCS 912, 174-189.
  • Landauer, T.K. and Dumais, S.T. (1997) A Solution to Plato’s Problem: The Latent Semantic Analysis Theory of Acquisition. Psychological Review, 104(2), 211-240.
  • Lund, K. and Burgess, C. (1996) Producing high-dimensional semantic spacer from lexical co-occurrence. Behavior Research Methods Instruments and Computers, 28(2), 203-208.
  • Mihalcea, R. (2003) The Role of Non-Ambiguous Words in Natural Language Disambiguation. In: Proceedings of the Fourth RANLP. John Benjamins, 357-366.
  • Mlodzki, R. and Przepiórkowski A. (2009) The WSDDevelopment Environment. In: Vetulani, Z., ed., Proc. 4rd Language and Technology Conference, Poznań, Poland. Wydawnictwo Poznańskie, Poznań, 245-250.
  • Navigli, R. (2009) Word sense disambiguation: A survey. ACM Comput. Surv., 41(2), 1-69.
  • Pantel, P. (2003) Clustering by committee. Ph.D. thesis, Edmonton, Alta., Canada, Canada.
  • Pedersen, T. (2006) Unsupervised Corpus Based Methods for WSD. In: Word Sense Disambiguation: Algorithms and Applications. Springer, 133-166.
  • Pedersen, T. (2010) Computational Approaches to Measuring the Similarity of Short Contexts: A Review of Applications and Methods. The Computing Research Repository, abs/0.806.3787.
  • Pedersen, T. and Kulkarni, A. (2006) Automatic cluster stopping with criterion functions and the Gap Statistic. In: Proceedings of the Demo Session of NAACL. ACL, 276-279.
  • Piasecki, M., Szpakowicz, S. and Broda, B. (2009) A WordNet from the Ground Up. Oficyna Wydawnicza Politechniki Wroclawskiej.
  • Przepiórkowski, A. (2004) The IPI PAN Corpus: Preliminary version. Institute of Computer Science PAS, Warsaw.
  • Przepiórkowski, A. (2006) The Potential of the IPI PAN Corpus. Poznań Studies in Contemporary Linguistics, 41, 31-48.
  • Quinlan, J.R. (1993) C4. 5: programs for machine learning. Morgan Kaufmann.
  • Schütze, H. (1998) Automatic word sense discrimination. Computational Linguistics, 24 (1), 97-123.
  • Settles, B. (2009) Active Learning Literature Survey. Computer Sciences Technical Report 1648, University of Wisconsin-Madison.
  • Vapnik, V.N. (1995) The Nature of Statistical Learning Theory. Springer Verlag.
  • Weiss, D. (2008) Korpus Rzeczpospolitej [on-line] http://www.cs.put.poznan. pl/dweiss/rzeczpospolita, corpus of texts from the online edition of Rzeczpospolita.
  • Yarowsky, D. (1993) One sense per collocation. In: Proceedings of the works hop on Human Language Technology. ACL, 266-271.
Typ dokumentu
Identyfikator YADDA
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.