Tytuł artykułu
Autorzy
Identyfikatory
Warianty tytułu
Języki publikacji
Abstrakty
Parallel corpus has recently become an indispensable resource in multilingual natural language processing. Manual preparation of a bilingual corpus is a laborious task. Therefore methods for the automated creation of parallel corpus are currently a topic of concern for many researchers. A number of sophisticated and effective algorithms for collecting parallel texts from the Web have already been created. Unfortunately, none of them have been used in the process of Polish-German corpus creation. That is why the aim of the research has been to verify the efficiency of existing algorithms for the collecting of Polish-German parallel corpus, intended as a reference source for a Machine Translation system, to propose a new algorithm and present results achieved by the new algorithm.
Czasopismo
Rocznik
Tom
Strony
41--45
Opis fizyczny
Bibliogr. 11 poz.
Twórcy
autor
- Faculty of Mathematics and Computer Science, Adam Mickiewicz University in Poznań, Poznań, Poland, monika.rosinska@gmail.com
Bibliografia
- [1] Brown P., Cocke J., Della Pietra S., Della Pietra V., Jelinek P., Mercer R., Roossin P., A statistical approach to machine translation, Computational Linguistics, 16(2), 1990, pp. 79-85.
- [2] Chen Jiang, Jian-Yun Nie, Web parallel text mining for Chinese English cross-language information retrieval. International Conference on Chinese Language Computing, Chicago, Illinois, 2000.
- [3] Gale W.A., Church K.W., Identifying word correspondences in parallel texts, Fourth DARPA Workshop on Speech and Natural Language, Asilomar, California, 1991.
- [4] Levenshtein V.I., Binary codes capable of correcting deletions, insertions and reversals, Doklady Akademii Nauk SSSR, 1965.
- [5] Ma X., Liberman M., Bits: A method for bilingual text search over the web, Machine Translation Summit VII, 1999.
- [6] McEnery T., Wilson A., Corpus Linguistics, Edinburgh University Press, 1996.
- [7] Melamed I. Dan, Models of translational equivalence among words, Computational Linguistics, 26(2), 2000, pp. 221-249.
- [8] Resnik P., Parallel strands: A preliminary investigation into mining the Web for bilingual text, Proceedings of the Third Conference of the Association for Machine Translation in the Americas, 1998.
- [9] Resnik P., Smith N., The Web as a Parallel Corpus, University of Maryland technical report UMIACS-TR-2002, 2002.
- [10] Sinclair J., Corpus Concordance Collocation, Oxford University Press, 1991.
- [11] Yianilos P.N., Kanzelberger K.G., The Likelt intelligent string comparison facility, Technical Report, NEC Research Institute, 1997
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BAT5-0042-0015