Content available remote Možnosti a meze korpusové lingvistiky
This paper addresses two most common comments on corpus linguistics: 1) a corpus is merely a card file index in electronic form and 2) corpus linguistics covers only corpora construction and linguistic marking. We argue that a corpus consists of much more complex material and it can be exploited in unprecedented ways. In response to the second question, we point out that corpus linguistics is an independent linguistic discipline with substantial contributions to linguistic theory and language description.
The study deals with the collocations 'SUBST + AKO + INF' and their syntactic status determining the writing of comma in the selected types of constructions. Reviewing the items of the Slovak National Corpus, the most frequent nouns realized within this type of constructions are identified. The corpus research proves that the attributive monomial infinitive sentences undergo complex sentence constructions, which can result into the syntagma with a simple attribute within the particular group of nouns. This process is reflected in the irregular comma writing realized within the corpus texts.
The purpose of this paper is the investigation of Proto-Germanic word order. To do such investigation, we needed to collect a number of texts written in the oldest Germanic languages, and to produce a tagged corpus on their basis. Due to the fact that there are no written texts in Proto-Germanic proper, we took into account texts from Old High German (OHG), Gothic and West-Saxon, as well as runic inscriptions. In order to objectivise the analysis, we chose texts whose parallel analysis in different languages would be possible. The best candidate for this analysis was the New Testament. Such procedure also allowed us to make recourse to the Vulgate and Septuagint and make further comparisons. The data that we obtained, mostly confirm the opinions generally held about Proto-Germanic word order, but there are some details that seem to say the opposite. For example, that Proto-Germanic had main clauses that were predominantly VO. Therefore, we venture to claim that Proto-Germanic was a VO language, especially if we take into account the elements V(erb) and O(bject).
Using the preposition 'bez' (without), these clauses explicitly express action whose non-realisation is significant for the mode of (un)realisation of the action of the main clause, cf. 'Divku pozdravil bez toho, aby se ji dotkl'. (He greeted the girl without touching her.). The author deals with the attitudes of Czech linguists toward these new competitors of the preposition 'aniz', the frequency of the sentences with the connectives 'bez toho, aby/ze' in contemporary Czech. The article also characterises the position of these subordinate clauses in the system of the Czech language from the point of view of their relationship to standard Czech.
The study builds on the article Frequency of Lexical Units of Foreign Origin in Slovak (Garabík – Karčová, 2019) which describes the origins of the most frequent words in Slovak texts and concludes that there is unexpectedly low percentage of loanwords in the sample. Our study analyses their dataset further to explore the relation of the word’s frequency and its probability of being a loanword, discovering that there is an inverse proportion between these two variables. Upon these findings we build the model of the loanword distribution that answers our question articulated in the title of this paper.
The purpose of this paper is to provide a contrastive analysis of some metaphorical conceptualizations of the notions expressed by the words alegría and radość. I analyze the metaphorical expressions which contain the lexical items in question and are based on the source domain of LIQUID. The method used in this study combines the theory of conceptual metaphor with the methods of corpus-based linguistics. This study is aimed to compare the way in which the source domain of LIQUID is elaborated linguistically in the analyzed expressions and to show which parameters of the emotion of joy are highlighted by particular aspects of this domain.
The article presents the structure of the Corpus of Historical Slovak – a diachronic corpus of written Slovak texts predating language standardization attempts (texts from the 15th to the 18th century). The content of the corpus is based predominantly on existing published transcribed manuscripts, in this sense it is an opportunistic corpus, aiming to collect primarily existing texts; but we also collect and transcribe some documents directly, in order to improve the chronological balance of the corpus. The corpus aims for historical accuracy captured orthography-wise, but given existing standards in transcribing historical Slovak, this was not always possible with complete accuracy.
On the occasion of the 20th Congress of Linguists, which was a manifestation of the dominance of the socio-cognitive paradigm, we compare the functionalist approach and the cognitive approach to understanding the nature of lexical meaning. Both theoretical frames have a strong explanatory dimension and are significantly compatible. Within a certain methodological synthesis, we examine the internal consistency as well as mutual compatibility of aspects of some models of meaning outlined or developed in the literature (V. Mathesius, J. Filipec, J. Dolník, D. Geeraerts, P. Hanks, J. Kořenský, M. Nagy). As a theoretically primary model, we find the one reflecting the processual character of language, i.e. meaning in actual speech and the assumptions of this process in the form of the meaning potential – the dynamically and probabilistically organized cognitive base, semantic-pragmatic network. Word represents a unilateral sign in this model. The compatibility of cognitivist interpretations with psychological and neurobiological knowledge should be regarded. The secondary model, i.e. a user-oriented presentation model (such as a lexicographical entry), has a more static character. It uses the presentational inventory of functional structural linguistics and “discretizes” the cognitive continuum into the form of bilateral units. This model is usage-based, so its basis is a large volume of the evidence of language usage that can be pre-processed by corpus tools into contextual patterns, i.e. “units” larger than the word, which is the characteristic feature of corpus approaches.
Content available remote Syntaktická proměna Českého akademického korpusu
The idea of the Czech Academic Corpus (CAC) came to life in 1971 thanks to the Department of Mathematical Linguistics within the Czech Language Institute. By the mid 1980s, a total of 540,000 words were morphologically and syntactically annotated manually. After the Prague Dependency Treebank (PDT) – the largest annotated treebank of Czech written texts – was built, the conversion from CAC to PDT format began. The main goal was to make the CAC and the PDT compatible, and thus to enable the integration of the CAC into the PDT. The second version of the CAC is thus a complete conversion of the internal format and annotation schemes. The conversion of syntactic annotation began three years after the syntactic annotation of PDT was finished. Such a situation is exceptional because, to our knowledge, there is no other language for which such a significant amount of data is being annotated in two subsequent projects. This article summarizes the experience acquired during the conversion of the CAC syntactic annotation.
The joint project of Hungarian Academy of Sciences in Budapest and the Czech Academy of Sciences in Prague Computational Lexicology and Dialogue Research has inspired not only specific approaches to new linguistic research, but has also directed attention toward the history of Hungarian and Czech linguistic description. Some previously hidden parallels in lexicography, grammar and corpus projects have been discovered and discussed. In this paper, an overview of main similarities in the phases of cultivation of these two languages reveals, among others, the important unifying role of the European style of education and scholarly work. In addition, a brief historical outline shows Czech and Hungarian as the subject of linguistic research with similar positions and as solving their specific problems in historical parallels. This information enables the depiction of new projects in corpus linguistics in a broader historical context.
Content available remote Jazyková regulace jako věc dohody
This article, a review of Václav Cvrček’s book on language regulation and the Concept of Minimal Intervention (2008), focuses on four main issues. (1) For the most part, Cvrček deals with linguists’ intervention into language. He pays little attention to the intervention of individuals in real interactions. (2) In Cvrček’s opinion, linguists should not present the public with prescriptive codifications, but rather, with descriptive ones. However, there is a more important difference between a reference book, which is presented and/or perceived as an instruction for language behavior, and a hypothetically exhaustive description of a language or its varieties which is neither presented nor perceived as instructive. (3) The authors find the definitions of the concepts of “real” and “declarative” attitudes very problematic. (4) Language norms are wrongly equated with the declarative attitudes of speakers towards their language. However, language norms can be neither inferred solely from usage nor reduced to usage. Rather, they consist of language users’ awareness of the language and its usage, or a set of features of regularly used linguistic means and their combinations. Finally, the authors suggest several specific points which Czech linguists should agree upon before implementing possible regulatory changes into practice.
