PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Collecting Polish-German parallel corpora in the Web

Autorzy
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Parallel corpus has recently become an indispensable resource in multilingual natural language processing. Manual preparation of a bilingual corpus is a laborious task. Therefore methods for the automated creation of parallel corpus are currently a topic of concern for many researchers. A number of sophisticated and effective algorithms for collecting parallel texts from the Web have already been created. Unfortunately, none of them have been used in the process of Polish-German corpus creation. That is why the aim of the research has been to verify the efficiency of existing algorithms for the collecting of Polish-German parallel corpus, intended as a reference source for a Machine Translation system, to propose a new algorithm and present results achieved by the new algorithm.
Słowa kluczowe
EN
Czasopismo
Rocznik
Strony
41--45
Opis fizyczny
Bibliogr. 11 poz.
Twórcy
autor
Bibliografia
  • [1] Brown P., Cocke J., Della Pietra S., Della Pietra V., Jelinek P., Mercer R., Roossin P., A statistical approach to machine translation, Computational Linguistics, 16(2), 1990, pp. 79-85.
  • [2] Chen Jiang, Jian-Yun Nie, Web parallel text mining for Chinese English cross-language information retrieval. International Conference on Chinese Language Computing, Chicago, Illinois, 2000.
  • [3] Gale W.A., Church K.W., Identifying word correspondences in parallel texts, Fourth DARPA Workshop on Speech and Natural Language, Asilomar, California, 1991.
  • [4] Levenshtein V.I., Binary codes capable of correcting deletions, insertions and reversals, Doklady Akademii Nauk SSSR, 1965.
  • [5] Ma X., Liberman M., Bits: A method for bilingual text search over the web, Machine Translation Summit VII, 1999.
  • [6] McEnery T., Wilson A., Corpus Linguistics, Edinburgh University Press, 1996.
  • [7] Melamed I. Dan, Models of translational equivalence among words, Computational Linguistics, 26(2), 2000, pp. 221-249.
  • [8] Resnik P., Parallel strands: A preliminary investigation into mining the Web for bilingual text, Proceedings of the Third Conference of the Association for Machine Translation in the Americas, 1998.
  • [9] Resnik P., Smith N., The Web as a Parallel Corpus, University of Maryland technical report UMIACS-TR-2002, 2002.
  • [10] Sinclair J., Corpus Concordance Collocation, Oxford University Press, 1991.
  • [11] Yianilos P.N., Kanzelberger K.G., The Likelt intelligent string comparison facility, Technical Report, NEC Research Institute, 1997
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BAT5-0042-0015
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.