Experimental Corpus of the Lithuanian Local Dialect of Punsk in Poland. Examples of the Lexical and Semantic AnnotationIn the article the author describes the experimental corpus of the Lithuanian local dialect of Puńsk in Poland (ECorp-of-Punsk). It is the first corpus of this type for the Lithuanian local dialect. The corpus consists of three subcorpora. The first one (referred to as fundamental) contains utterances given by Lithuanians in the local dialect, the second one – utterances given by Lithuanians in Polish, the third one – aligned Polish-dialectal texts. The texts recorded in the years 1986–2012 have been included in the Ecorp-of-Punsk resources.
Lexical exponents of hypothetical modality in Polish and LithuanianThe article focuses on the lexical exponents of hypothetical modality in Polish and Lithuanian. The purpose for comparing and contrasting the lexical exponents of hypothetical modality is not only to identify all the lexemes in both languages but also find the answer to the following question: whether the morphological exponents of hypothetical modality (so-called modus relativus) familiar to the Lithuanian language have/had an influence on limiting the number of the lexical exponents and the frequency of using these exponents in the Lithuanian language (in comparison with Polish).To analyse both the languages there is used the method of theoretical contrastive studies, which the most important features are: (1) orienting the studies from the content grounds to the formal grounds, (2) using a semantic interlanguage as tertium comparationis. First of all, the content of hypothetical modality and its definition and paraphrase is given here. Next, the gradational character of this category is discussed. There are distinguished six groups of lexemes expressing the corresponding degrees of hypothetical modality - from a shadow of uncertainty (minimal degree of probability) to an almost complete certainty (maximum degree of probability). The experimental Polish-Lithuanian corpus is widely applied in the studies.
This article sets out to illustrate possible applications of electronic corpora in the translation classroom. Starting with a survey of corpus use within corpus-based translation studies, the didactic value of corpora in the translation classroom and their epistemic value in translation teaching and practice will be elaborated. A typology of translation practice-oriented corpora will be presented, and the use of corpora in translation will be positioned within two general models of translation competence. Special consideration will then be given to the design and application of so-called Do-it-yourself (DIY) corpora, which are compiled ad hoc with the aim of completing a specific translation task. In this context, possible sources for retrieving corpus texts will be presented and evaluated and it will be argued that, owing to time and availability constraints in real-life translation, the Internet should be used as a major source of corpus data. After a brief discussion of possible Internet research techniques for targeted and quality-focused corpus compilation, the possible use of the Internet itself as a macro-corpus will be elaborated. The article concludes with a brief presentation of corpus use in translation teaching in the MA in Specialised Translation Programme offered at Cologne University of Applied Sciences, Germany.
This article sets out to illustrate possible applications of electronic corpora in the translation classroom. Starting with a survey of corpus use within corpus-based translation studies, the didactic value of corpora in the translation classroom and their epistemic value in translation teaching and practice will be elaborated. A typology of translation practice-oriented corpora will be presented, and the use of corpora in translation will be positioned within two general models of translation competence. Special consideration will then be given to the design and application of so-called Do-it-yourself (DIY) corpora, which are compiled ad hoc with the aim of completing a specific translation task. In this context, possible sources for retrieving corpus texts will be presented and evaluated and it will be argued that, owing to time and availability constraints in real-life translation, the Internet should be used as a major source of corpus data. After a brief discussion of possible Internet research techniques for targeted and quality-focused corpus compilation, the possible use of the Internet itself as a macro-corpus will be elaborated. The article concludes with a brief presentation of corpus use in translation teaching in the MA in Specialised Translation Programme offered at Cologne University of Applied Sciences, Germany.
Over the past decades, corpus linguistics has become widespread in studies dealing with applied linguistics. Teachers have become acquainted with corpus lingustic methods and are using computer technology in their professional practice. A well-known example is the learner corpora with which researchers managed to attain invaluable results concerning various aspects of learner language. This article, however, presents a new field within corpus linguistics: the teacher corpus. A corpus on teacher language (instead of learner language) has a lot to offer in terms of methodology and pedagogy.
In this article, we discuss the elaboration and use of vocabulary lists aimed for learners of French as a for-eign language. These lists are commonly based on corpora, which, in the ideal case, are representative, relevant and large. As for the English language, this kind of corpora has been available for a long time (e.g. the COCA and the BNC). Vocabulary lists, which are often used in learning contexts, have been based on these corpora. The situation is, however, less favorable when it comes to the French language, with fewer corpora meeting the mentioned criteria, and, thus, fewer possibilities to create vocabulary lists that are useful for learners. In this contribution, we present work that has been done in order to create a vocabulary list, Riksprovsordlistan, containing about 4,000 words and used at all Swedish universities. The discussion focuses on methodological challenges such as choice of counting unit – lemma vs. word family –, the role of frequency, thematic vocabulary, as well as characteristics of written vs. spoken corpora.
L’obiettivo del presente contributo è esaminare la possibilità di utilizzare i corpora nell’insegnamento di argomenti di ordine grammaticale ad apprendenti di italiano LS. Il contributo presenta una proposta didattica basata su corpora e sperimentata con studenti di italiano dell’Università “Ss. Cirillo e Metodio” di Skopje nell’anno accademico 2016/2017. L’ipotesi alla base della proposta didattica è che l’uso diretto e guidato di corpora possa sensibilizzare gli studenti alla problematicità dell’argomento grammaticale trattato, ma anche fornire loro degli strumenti e percorsi per esplorare la lingua in modo autonomo. La prima parte del contributo esamina l’uso dei corpora nell’insegnamento delle lingue straniere con riferimento particolare all’uso dei corpora nell’contesto dell’italiano LS. La parte centrale presenta le attività didattiche proposte sulle frasi concessive e il contesto in cui sono state sperimentate. L’ultima parte verte sulle considerazioni degli studenti riportate in un questionario da una parte e sulle potenzialità e i limiti di questo approccio didattico dall’altra.
EN
The aim of the paper is to explore the possibility of using corpora in teaching grammar to learners of Italian as a foreign language. The paper presents corpus-based activities used with students of Italian at the Saints Cyril and Methodius University in Skopje in 2016/2017. The hypothesis underlying this teaching approach is that direct and guided use of corpora can raise students’ awareness of the complexity of the phenomena observed as well as present them with resources and methods to explore the language more autonomously. The first part of the paper investigates the use of corpora in language teaching, with particular attention to the use of these resources in teaching Italian as a foreign language. The main section describes the activities on concessives and the context in which they have been used. The final part reports on the observations of the students and addresses the advantages and disadvantages of this teaching approach.
There is a specific combinatorial periphery in any language consisting of words whose combinatorial potential is extremely restricted. These words, which are usually referred to as bound words, unique words, cranberry words or monocollocable words (MWs), belong to small and closed collocation paradigms, their number of collocates ranging from one to a few (usually ± 7). The present article tries to describe the phenomenon of monocollocability in Italian, basing the analysis on a list of Italian MWs extracted from corpora and contained in the book Language Periphery, Monocollocable Words in English, German, Italian and Czech (Čermák et al., 2016). Italian MWs and the fixed combinations in which they occur are analysed in terms of syntactic structures, semantic features, collocation structures and frequency. Monocollocability is a phenomenon subject to change in time: even though MWs are often considered to be “relicts of the past”, the collected data prove that progressive restriction of the combinatorial capacity of certain words can be observed in Present-Day Italian as well.
IT
In tutte le lingue naturali troviamo parole sprovviste di autonomia sintattica e semantica che possono esistere soltanto all’interno di una combinazione lessicale. Queste parole, designate con i termini cranberry words, bound words, unique words o monocollocable words (parole monocollocabili, PM), sono caratterizzate da un raggio collocazionale estremamente ristretto (che va solitamente da 1 fino a ± 7 collocati). Il presente articolo vuole descrivere il fenomeno della monocollocabilità nell’italiano di oggi, basandosi sulle liste delle PM italiane estratte dai corpora e contenute nel libro Language Periphery, Monocollocable Words in English, German, Italian and Czech (Čermák et al., 2016). Le PM italiane e le locuzioni a cui esse fanno capo vengono analizzate sotto il profilo sintattico, semantico, collocazionale e frequenziale. La monocollocabilità è un fenomeno mutevole nel tempo: nonostante le PM vengano spesso considerate “relitti del passato”, i dati raccolti mostrano che una progressiva trasformazione di alcune parole con ampio raggio collocazionale in parole monocollocabili avviene anche nel lessico attuale.
The article presents the recent initiative of the authors of the article to prepare the ground for setting up a corpus of texts annotated from the viewpoint of Functional Sentence Perspective (FSP). The authors are followers of Jan Firbas’s approach to information structure, who have carried out a parallel analysis of a text of fiction in search of concepts within the FSP theory that need elaboration. The article outlines the discrepancies between different interpretations of selected phenomena within the text and suggests a refinement of some FSP concepts. It presents a simple FSP tagging system, which allows the annotation of FSP functions and degrees of communicative dynamism carried by communicative units.
11
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
This paper has a double aim: i) to empirically establish the functions naturally expressed by any and those of one of its Spanish counterparts, specifically cualquier(a) so as to identify possible cross-linguistic transfer; ii) to illustrate a high-performing methodological procedure. A set of tools, among them an ad hoc tertium comparationis consisting of a set of cross-linguistic labels, a parallel corpus (P-ACTRES) and a reference corpus (CREA) are used to explore: i) the uses of any in context and the resulting translation environments served by cualquiera; ii) the degree of matching between translated cualquier(a) and its non-translated usage in standard European Spanish. The corpus-based procedure follows basically Krzeszowski's contrastive model (1990) with the addition of a 'target language fit' stage (Chesterman, 2004). The analysis shows different behaviour in translated and nontranslated Spanish: Cualquier(a) is underused as a translation option for 'existential' any and acquires a new function, 'negative', which is not a possibility in non-translated language for the same contexts. The analysis also corroborates the usefulness and the replicability of the methodological procedure.
Experimental Polish-Lithuanian Corpus with the Semantic Annotation ElementsIn the article the authors present the experimental Polish-Lithuanian corpus (ECorpPL-LT) formed for the idea of Polish-Lithuanian theoretical contrastive studies, a Polish-Lithuanian electronic dictionary, and as help for a sworn translator. The semantic annotation being brought into ECorpPL-LT is extremely useful in Polish-Lithuanian contrastive studies, and also proves helpful in translation work.
13
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Consulting documented language usage in large corpora has become a fundamental tool in lexicography. The selection and systematization of lexical units are supported by corpora tools providing frequency and different concordances - as will be presented in the practice of the current project of a multilingual thematic dictionary. On-line dictionaries can also provide a richer and more up-to-date vocabulary. The dictionary in progress employs a special structure that aids in language learning, based on pragmatic and semantic relations. Its machine-readable version will be more suited to take advantage of its potentials.
14
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
The present study, couched within the framework of the Concept Types and Determination theory (CTD) and relying upon corpus data, attempts to provide further evidence for the claim that Czech, especially its informal spoken variety, is developing a definite article from the distance-neutral demonstrative ten in adnominal uses. The CTD theory has proved its utility for studying emerging definite articles in Western Slavic languages in the works of Adrian Czardybon and Albert Ortmann. At its core lies the distinction between the so-called “pragmatic” and “semantic” definiteness. It is generally assumed that emerging definite articles spread from the former to the latter, and the grammaticalization process is considered accomplished once the former demonstrative systematically appears in contexts of semantic definiteness. This study applies the distinction, made by Löbner, to a corpus sample of 1,000 occurrences of the adnominal ten, many of which appear to manifest characteristics typical of definite articles across languages.
This paper puts forward the hypothesis that there is a future infinitive evolving in present day German and addresses the theoretical consequences that this might have. Section 1 gives the basic definitions as well as some introductory examples. Section 2 presents evidence in favour of the hypothesis, and possible objections are considered in section 3. Finally, section 4 focusses on more theoretical implications.
16
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
It is only in the new era of large electronic corpora that some low-frequency grammatical structures can be tested for their communicative as well as their systemic status. This is also the case of the complex predicate of the type ‘mít (to have) + abstract noun’ (e.g. mít zkušenost, to have experience) in the sense of ‘být zkušený’ (to be experienced) in contemporary Czech. The expression v sobě (in oneself), if attached to the predicative syntagma of the type ‘mít (to have) + abstract noun’ can make this complex predicate acceptable and grammatical.
The purpose of the paper is twofold. First, to describe the already implemented idea of DjVu corpora, i.e. corpora which consist of both scanned images and a transcription of the texts with the words associated with their occurrences in the scans. Secondly, to present a case study of a corpus consisting of almost 5 000 pages of Polish historical texts dating from 1570 to 1756 (it is practically the very first corpus of historical Polish). The tools described have universal character and are freely available under the GNU GPL license, hence they can be used also for other purposes.
Although palatalization changing [k] into [tS] was most widespread in Southumbria, the previous examination (Kocel 2009, 2010) has already proved that on no account can it be perceived as a homogeneous process. This lack of consistency is reflected in many instances of palatal forms found in the North alongside many nonpalatal ones encountered in the East Midlands and London. Consequently, the substantial number of such “odd” forms seems to defy the existence of clear-cut boundaries between the above mentioned areas, allowing for an unhindered influx and amalgamation of ostensibly dialect-specific variants. The problem appears even more complex, taking into account the vast collection of dialectally unidentified Middle English texts which, containing both palatal and nonpalatal forms, only corroborate the fact that palatalization could not be dialect or even area specific. The multitude of variants present in those texts, a result of the Scandinavian influence and dialectal borrowing, point to the process of the lexical diffusion of these forms across the whole English territory, affecting in particular such high-frequency items as the grammatical words each, much, such and which. The aim of the study, thus, will be to determine the extent of palatalization affecting these grammatical words, through the analysis of the spelling/phonological discrepancies and the distribution of each, much, such and which in unclassified Late Middle English sources. The data come from the Innsbruck Corpus of Middle English Prose, The Middle English Dictionary and A Linguistic Atlas of Late Mediaeval English.
OE *durran ‘dare’ is a preterite-present verb and one of six such verbs whose various forms have survived into Modern English. The main feature of the members of the group is that their strong past tense acquired a present meaning, and thus a new weak past tense developed over time. An outline of other characteristic features of these verbs is included in section ‘0’ (introductory remarks), yet the aim of the present paper is to establish the distribution of the verb *durran in Middle English with regard to periods and regions, also considering differences in spelling. Also, the paper examines fixed expressions such as how dare you or I dare say. The Middle English data are derived from the Prose corpus of the Innsbruck computer archive of machine-readable English texts. Additional sources, like the Dictionary of Old English on CD-ROM, the electronic Middle English dictionary and the Oxford English dictionary online are also referred to.
Although the use of Geographic Information Systems (GIS) has a long history in archaeology, spatial technologies have been rarely used to analyse the content of textual collections. A newly developed approach termed Geographic Text Analysis (GTA) is now allowing the semi-automated exploration of large corpora incorporating a combination of Natural Language Processing techniques, Corpus Linguistics, and GIS. In this article we explain the development of GTA, propose possible uses of this methodology in the field of archaeology, and give a summary of the challenges that emerge from this type of analysis.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.