The Upper Sorbian text corpus and further sources of information with regard to Upper Sorbian in the InternetIn the present era of globalisation and the omnipresence of the Internet, Sorbian linguistics faces new challenges along the lines “What is not in the Internet, does not exist”. The demand for digital sources of information with regard to Upper and Lower Sorbian and those accessible online as working tools and reference points for language practice and as a source for academic research increases. As a result of this ongoing development, the Foundation for the Sorbian People established a workgroup called “Sorbian in the new media” at the end of 2012, which has pointed out the creation of an online GermanUpper Sorbian dictionary as the major task in this field of activities. The focus of this article, however, is the Upper-Sorbian text corpus HoTKo, which has been created by the Sorbian Institute and which has been made available in co-operation with the Institute of the Czech National Corpus at the Charles University in Prague. The article presents the history and development of the corpus, its extent and shape as well as its link to or incorporation into further planned digital projects of the Sorbian Institute with regard to the Upper Sorbian language.
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Consulting documented language usage in large corpora has become a fundamental tool in lexicography. The selection and systematization of lexical units are supported by corpora tools providing frequency and different concordances - as will be presented in the practice of the current project of a multilingual thematic dictionary. On-line dictionaries can also provide a richer and more up-to-date vocabulary. The dictionary in progress employs a special structure that aids in language learning, based on pragmatic and semantic relations. Its machine-readable version will be more suited to take advantage of its potentials.
The article presents an analysis of selected olfactory perception nouns in French for the purpose of machine translation. The initial hypothesis, according to which nouns such as odeur, parfum, arôme, puanteur, senteur create a coherent set, i.e. an object class characterized by a certain group of operations (verbs), is subjected to corpus verification. The research, based on the French corpus frTenTen12, confirms this hypothesis and allows to distinguish 100 verbal operators common to all the elements of the studied class. In the further part of the article, examples of descriptions of the collected language material are presented in IT-implementable formats, which can be used in machine translation software. The first table shows the syntactic combinatorics of the class in French and Polish and the second one takes the form of bilingual lexicographical “flashcard”, in which the operators characterizing the studied class are divided into three groups: constructors, manipulators and accessors, according to the object-oriented approach by Wiesław Banyś.
W artykule przedstawiona jest analiza wybranych francuskich rzeczowników percepcji węchowej dla celów tłumaczenia automatycznego. Weryfikacji korpusowej poddana została hipoteza wyjściowa, zgodnie z którą rzeczowniki takie jak odeur, parfum, arôme, puanteur, senteur tworzą semantyczno-gramatycznie koherentny zbiór tzn. klasę obiektową charakteryzującą się pewnym wspólnym zestawem operacji (czasowników). Badania w oparciu o korpus językowy frTenTen12 pozwalają hipotezę tę potwierdzić i wyróżnić sto operatorów czasownikowych wspólnych dla wszystkich elementów badanej klasy. W dalszej części artykułu zaprezentowane są przykładowe opisy zebranego materiału językowego w formatach informatycznie implementowanych tj. takich, które mogą znaleźć zastosowanie w programach do automatycznego tłumaczenia tekstów. Pierwsza tabela przedstawia kombinatorykę składniową klasy w języku francuskim i polskim; druga natomiast przyjmuje formę dwujęzycznej „fiszki” leksykograficznej, w której operatory charakteryzujące badaną klasę są podzielone, zgodnie z założeniami ujęcia zorientowanego obiektowo W. Banysia, na trzy grupy: konstruktory, manipulatory i akcesory.
The article presents methods for application of lexicographic data for the purposes of foreign language teaching and linguistic studies. The automated graph visualization technique is introduced. Both the retrieval and the visualization of linguistic data in form of directed graphs have been conducted with the use of available computer software. The material on which the above-mentioned operations have been performed comprises the group of Russian adjectives with без- / бес- prefixes and the synonymic chains initiated by them. The automated extraction with the use of regular expressions, the steps leading to the edition of the generated database and the import of the final data base into a visualization software are described in the article. The grammatical and semantic characteristics of the sample group of lexemes are not the focus of this article, although further analysis of the selected vocabulary units is planned to be conducted in future studies.
The article presents methods for application of lexicographic data for the purposes of foreign language teaching and linguistic studies. The automated graph visualization technique is introduced. Both the retrieval and the visualization of linguistic data in form of directed graphs have been conducted with the use of available computer software. The material on which the above-mentioned operations have been performed comprises the group of Russian adjectives with без- / бес- prefixes and the synonymic chains initiated by them. The automated extraction with the use of regular expressions, the steps leading to the edition of the generated database and the import of the final data base into a visualization software are described in the article. The grammatical and semantic characteristics of the sample group of lexemes are not the focus of this article, although further analysis of the selected vocabulary units is planned to be conducted in future studies.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.