This study aims to provide and analyze a representative list of Czech initial syllable onsets and final codas along with their frequencies of occurrence in running text (token frequencies) and in the vocabulary of unique word forms extracted from it (type frequencies). The frequency data are important because many experiments have demonstrated that phonotactics is not categorical, but rather gradient in nature. Importantly, the study analyzes and compares both spoken and written texts, using the Czech National Corpus, and the two modalities are hypothesized to yield different outcomes. All words in the sample were transcribed phonemically and analyzed. A general preference was found for phonotactic structures that are simple in the context of the attested inventory, and the two corpora differed most in the repertoire of complex onsets/codas (some sequences being unique to one modality) as well as in their respective frequencies. The results are discussed in relation to previous studies of Czech phonotactics, and evaluated with respect to implications for phonological theory, focusing on spoken/written and type/token comparisons.
Content available remote Variabilita češtiny : multidimenzionální analýza
The article summarizes the theoretical foundations and results of a corpus-driven study of register variability in contemporary Czech. The descriptive framework is based on the methodology of multidimensional analysis, as previously applied to various other languages (see Biber 1995). The starting point is a quantitative analysis of a custom-built genre-diversified corpus in which linguistic features have been identified that are likely to be related to functional and systematic variability on different linguistic levels. Statistical processing using factor analysis then yields a model which identifies (in the case of Czech) 8 dimensions of variation of the texts. The greatest proportion of variance is explained by the first two dimensions, which can be described as dichotomies distinguishing between dynamic vs. static and spontaneous vs. prepared.
