Collecting Polish-German parallel corpora in the Web

Rosińska, M.

Artykuł - szczegóły

Tytuł artykułu

Collecting Polish-German parallel corpora in the Web

Autorzy

Rosińska M.

Identyfikatory

Warianty tytułu

Języki publikacji

Abstrakty

Parallel corpus has recently become an indispensable resource in multilingual natural language processing. Manual preparation of a bilingual corpus is a laborious task. Therefore methods for the automated creation of parallel corpus are currently a topic of concern for many researchers. A number of sophisticated and effective algorithms for collecting parallel texts from the Web have already been created. Unfortunately, none of them have been used in the process of Polish-German corpus creation. That is why the aim of the research has been to verify the efficiency of existing algorithms for the collecting of Polish-German parallel corpus, intended as a reference source for a Machine Translation system, to propose a new algorithm and present results achieved by the new algorithm.

Słowa kluczowe

corpus parallel

Wydawca

Oficyna Wydawnicza Politechniki Wrocławskiej

Czasopismo

Systems Science

Rocznik

2008

Tom

Vol. 34, no 4

Strony

41--45

Opis fizyczny

Bibliogr. 11 poz.

Twórcy

autor

Rosińska M.

Faculty of Mathematics and Computer Science, Adam Mickiewicz University in Poznań, Poznań, Poland, monika.rosinska@gmail.com

Bibliografia

[1] Brown P., Cocke J., Della Pietra S., Della Pietra V., Jelinek P., Mercer R., Roossin P., A statistical approach to machine translation, Computational Linguistics, 16(2), 1990, pp. 79-85.
[2] Chen Jiang, Jian-Yun Nie, Web parallel text mining for Chinese English cross-language information retrieval. International Conference on Chinese Language Computing, Chicago, Illinois, 2000.
[3] Gale W.A., Church K.W., Identifying word correspondences in parallel texts, Fourth DARPA Workshop on Speech and Natural Language, Asilomar, California, 1991.
[4] Levenshtein V.I., Binary codes capable of correcting deletions, insertions and reversals, Doklady Akademii Nauk SSSR, 1965.
[5] Ma X., Liberman M., Bits: A method for bilingual text search over the web, Machine Translation Summit VII, 1999.
[6] McEnery T., Wilson A., Corpus Linguistics, Edinburgh University Press, 1996.
[7] Melamed I. Dan, Models of translational equivalence among words, Computational Linguistics, 26(2), 2000, pp. 221-249.
[8] Resnik P., Parallel strands: A preliminary investigation into mining the Web for bilingual text, Proceedings of the Third Conference of the Association for Machine Translation in the Americas, 1998.
[9] Resnik P., Smith N., The Web as a Parallel Corpus, University of Maryland technical report UMIACS-TR-2002, 2002.
[10] Sinclair J., Corpus Concordance Collocation, Oxford University Press, 1991.
[11] Yianilos P.N., Kanzelberger K.G., The Likelt intelligent string comparison facility, Technical Report, NEC Research Institute, 1997

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-article-BAT5-0042-0015