Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
2008 | Vol. 34, no 4 | 41-45
Tytuł artykułu

Collecting Polish-German parallel corpora in the Web

Warianty tytułu
Języki publikacji
Parallel corpus has recently become an indispensable resource in multilingual natural language processing. Manual preparation of a bilingual corpus is a laborious task. Therefore methods for the automated creation of parallel corpus are currently a topic of concern for many researchers. A number of sophisticated and effective algorithms for collecting parallel texts from the Web have already been created. Unfortunately, none of them have been used in the process of Polish-German corpus creation. That is why the aim of the research has been to verify the efficiency of existing algorithms for the collecting of Polish-German parallel corpus, intended as a reference source for a Machine Translation system, to propose a new algorithm and present results achieved by the new algorithm.
Słowa kluczowe

Opis fizyczny
Bibliogr. 11 poz.
  • [1] Brown P., Cocke J., Della Pietra S., Della Pietra V., Jelinek P., Mercer R., Roossin P., A statistical approach to machine translation, Computational Linguistics, 16(2), 1990, pp. 79-85.
  • [2] Chen Jiang, Jian-Yun Nie, Web parallel text mining for Chinese English cross-language information retrieval. International Conference on Chinese Language Computing, Chicago, Illinois, 2000.
  • [3] Gale W.A., Church K.W., Identifying word correspondences in parallel texts, Fourth DARPA Workshop on Speech and Natural Language, Asilomar, California, 1991.
  • [4] Levenshtein V.I., Binary codes capable of correcting deletions, insertions and reversals, Doklady Akademii Nauk SSSR, 1965.
  • [5] Ma X., Liberman M., Bits: A method for bilingual text search over the web, Machine Translation Summit VII, 1999.
  • [6] McEnery T., Wilson A., Corpus Linguistics, Edinburgh University Press, 1996.
  • [7] Melamed I. Dan, Models of translational equivalence among words, Computational Linguistics, 26(2), 2000, pp. 221-249.
  • [8] Resnik P., Parallel strands: A preliminary investigation into mining the Web for bilingual text, Proceedings of the Third Conference of the Association for Machine Translation in the Americas, 1998.
  • [9] Resnik P., Smith N., The Web as a Parallel Corpus, University of Maryland technical report UMIACS-TR-2002, 2002.
  • [10] Sinclair J., Corpus Concordance Collocation, Oxford University Press, 1991.
  • [11] Yianilos P.N., Kanzelberger K.G., The Likelt intelligent string comparison facility, Technical Report, NEC Research Institute, 1997
Typ dokumentu
Identyfikator YADDA
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.