Based on the written part of the British Component of International Corpus of English (ICE-GB), this paper investigates the interrelationship between length and complexity of sentential constituents and their positions in the sentence. Results show that length and complexity affect sentential constituent ordering. Within the sentence, the longest and the most complex constituents tend to occur in the final position, and the relatively shorter and less complex constituents tend to be in the initial position. However, for sentential constituents in other positions, the length-complexity-position relationship appears to be random. Possible explanations for the findings are provided from different perspectives, especially from the distribution of given and new information.
2
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Previous research on word class distribution claimed that 37% of word tokens are nouns, suggesting that there might exist a certain regularity of noun proportion among human languages. To explore this possibility, we examined the proportion of noun and four other word classes within British and American English, and across seven languages in terms of different word frequency band. Results indicated that the noun proportion is evidently about or larger than 37%, and meanwhile increases with word rarity. Among frequent words, nouns increase as minor word classes decrease, whereas among rare words, the noun proportion remains a stable level.
3
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
Based on real-text corpora with syntactic annotation, this study quantitatively addressed the following two questions: whether quantitative methods and indexes can point to the diachronic syntactic drifts characterizing the evolution from Latin to Romance languages and whether these methods and indexes can provide evidence to evince the shared syntactic features among Romance languages and define them as a distinctive language subgroup. Our study shows that the distributions of dependency directions are suggestive of positive answers to the above two questions. In addition, the dependency syntactic networks extracted from the dependency treebanks reflect the degree of inflectional variation of a language, and the clustering analysis shows that these parameters, in spite of some imperfections, can also help differentiate Romance languages from Latin diachronically and from other languages synchronically.
4
Dostęp do pełnego tekstu na zewnętrznej witrynie WWW
This paper offers a quantitative analysis of the syntactic and typological properties of Chinese based on five Chinese dependency treebanks. The study shows that mean dependency distance of Chinese is 2.84; 40-50% dependencies are between non-adjacent words; Chinese is a mixed language with a governor-final and SV-VO-AdjN preference; the mean dependency distance of governor-initial dependencies is greater than that of governor-final ones. Methodologically, the paper adopts five treebanks with different text genres and annotation schemes as a resource to study syntactic features of a language. This method avoids corpus influences on results so that the conclusions can be more reliable and robust. If suitable treebanks are available, it will be an easy task to apply our method to other languages. In this way, the method has a broad theoretical and cross-linguistic perspective.
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.