SOME PROBLEMS IN MULTILINGUAL DIGITAL DICTIONARIES
The article discusses some observations from the joint work of Polish and Bulgarian research groups on the digital Bulgarian-Polish and Polish-Ukrainian dictionaries, as well as the projected multilingual (initially: Bulgarian-Polish-Ukrainian) dictionary. The researchers are currently working on a parallel corpus containing texts in Bulgarian and Polish, distributed over the Internet, whereby the translation correspondence is one-to-one. They are developing a comparable corpus that includes texts in Bulgarian and Polish (excerpts from newspapers, literary works, Internet textual documents) with the text sizes being comparable across the two languages. The two corpora, parallel and comparable, form the first Bulgarian-Polish corpus, that will be prepared in CES format, manually or using ad-hoc tools, and will be annotated on 'paragraph' and 'sentence' levels, according to the text annotation international standards. This bilingual corpus will provide a sample of the vocabulary to be included in an initial experimental version of the Bulgarian-Polish digital dictionary. The bi- and multilingual digital dictionaries have more limitations and require even more so that the description of language specifications of the headword in each entry of the dictionary be simple and simultaneously more comprehensive. The fact that the lexical form in every language may have several meanings that do not overlap across the respective compared languages also has to be addressed. Great difficulties have to be addressed in order for a dictionary to satisfy the needs of a translator, a language researcher or an everyday user.
CEJSH db identifier