PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Entropic evolution of lexical richness of homogeneous texts over time : A dynamic complexity perspective

Autorzy
Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
This work concerns the evolving pattern of the lexical richness of the corpus text of China Government Work Report measured by entropy, based on a fundamental assumption that these texts are linguistically homogeneous. The corpus is interpreted and studied as a dynamic system, the components of which maintain spontaneous variations, adjustment, self-organizations, and adaptations to fit into the semantic, discourse, and sociolinguistic functions that the text is set to perform. Both the macroscopic structural trend and the microscopic fluctuations of the time series of the interested entropic process are meticulously investigated from the dynamic complexity theoretical perspective. Rigorous nonlinear regression analysis is provided throughout the study for empirical justifications to the theoretical postulations. An overall concave model with modulated fluctuations incorporated is proposed and statistically tested to represent the key quantitative findings. Possible extensions of the current study are discussed.
Rocznik
Strony
569--599
Opis fizyczny
Bibliogr. 44 poz., tab., wykr.
Twórcy
autor
  • The Chinese University of Hong Kong, Hong Kong
Bibliografia
  • [1] Martin Bailyn (1994), A Survey of Thermodynamics, American Institute of Physics, New York.
  • [2] Doug Beeferman, Adam Berger, and John Lafferty (1997), A model of lexical attraction and repulsion, in Proceedings of the ACL, pp. 373-380, Madrid, Spain.
  • [3] Soren Bisgaard and Murat Kulahci (2004), Time Series Analysis and Forecasting by Example, John Wiley & Sons, Hoboken, New Jersey.
  • [4] Juliette Blevins (2004), Evolutionary Phonology: The Emergence of Sound Patterns, Cambridge University Press, Cambridge, MA.
  • [5] Peter F. Brown, Steven A. Della Pietra, Vincent J. Della Pietra, Jennifer C. Lai, and Robert L. Mercer (1992), An estimate of an upper bound for the entropy of English, Computational Linguistics, 18 (1): 31-40.
  • [6] Samprit Chatterjee and Ali S. Hadi (2012), Regression Analysis by Example, John Wiley & Sons, New York.
  • [7] Qinghua Chen, Jinzhong Guo, and Yufan Liu (2012), A statistical study on Chinese word and character usage in literatures from the Tang Dynasty to the present, Journal of Quantitative Linguistics, 19: 232-248.
  • [8] William Croft (2008), Evolutionary linguistics, Annual Review of Anthropology, 37: 219-234.
  • [9] Scott A. Crossley and Danielle S. McNamara (2011), Shared features of L2 writing: Intergroup homogeneity and text classification, Journal of Second Language Writing, 20 (4): 271-285.
  • [10] Etienne Denoual (2005), The influence of example-data homogeneity on EBMT quality, in Proceedings of the Second Workshop on Example-Based Machine Translation, pp. 35-42, Phuket, Thailand.
  • [11] Alvar Ellegård (1953), The Auxiliary Do: the Establishment and Regulation of Its Use in English, Almqvist and Wiksell, Stockholm.
  • [12] Robert Fildes (1992), The evaluation of extrapolative forecasting methods, International Journal of Forecasting, 8: 81-98.
  • [13] Dmitriy Genzel and Eugene Charniak (2002), Entropy rate constancy in text, in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pp. 199-206, Philadelphia.
  • [14] Stefen Th. Gries (2006), Exploring variability within and between corpora: some methodological considerations, Corpora, 1 (2): 109-151.
  • [15] Trevor Hastie, Robert Tibshirani, and Jerome Friedman (2009), The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer, New York.
  • [16] Douglas M. Hawkins (2004), The problem of overfitting, Journal of Chemical Information and Computer Sciences, 44: 1-12.
  • [17] Scott Jarvis (2013), Capturing the diversity in lexical diversity, Language Learning, 63: 87-106.
  • [18] Victoria Johansson (2008), Lexical diversity and lexical density in speech and writing: a developmental perspective, Lund Working Papers in Linguistics, 53: 61-79.
  • [19] Adam Kilgarriff (2001), Comparing corpora, International Journal of Corpus Linguistics, 6 (1): 1-37.
  • [20] Adam Kilgarriff and Gregory Grefenstette (2003), Introduction to the special issue on the web as corpus, Computational Linguistics, 29 (3): 333-348.
  • [21] Andras Kornai, Peter Halacsy, Viktor Nagy, Csaba Orzvecz, Viktor Tron, and Daniel Varga (2006), Web-based frequency dictionaries for medium density languages, in Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics, pp. 1-8, Trento, Italy.
  • [22] Yu-Min Ku and Richard C. Anderson (2003), Development of morphological awareness in Chinese and English, Reading and Writing: An Interdisciplinary Journal, 16 (1): 399-422.
  • [23] Diane Larsen-Freeman and Lynne Cameron (2008), Complex Systems and Applied Linguistics, Oxford University Press, Oxford.
  • [24] Namhee Lee and John H. Schumann (2003), The evolution of language and the symbolosphere as complex adaptive system, paper presented at the American Association of Applied Linguistics Conference, Arlington, VA.
  • [25] Brain MacWhinney (2007), A unified model, in P. Robinson and N. Ellis, editors, Handbook of Cognitive Linguistics and Second Language Acquisition, Lawrence Erlbaum Associates, Mahwah, NJ.
  • [26] Elinor McKone (1995), Short-term implicit memory for words and non-words, Journal of Experimental Psychology: Learning, Memory, and Cognition, 21: 1108-1126.
  • [27] Paul Meara (2006), Emergent properties of multilingual lexicons, Applied Linguistics, 27 (4): 620-644.
  • [28] Charles A. Perfetti and Lihai Tan (1998), The time-course of graphic, phonological, and semantic activation in Chinese character identification, Journal of Experimental Psychology: Learning, Memory, and Cognition, 24: 1-18.
  • [29] Nornadiah M. Razali and Yap B. Wah (2011), Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests, Journal of Statistical Modeling and Analytics, 2 (1): 21-33.
  • [30] Magnus Sahlgren and Jussi Karlgren (2005), Counting lumps in word space: density as a measure of corpus homogeneity, in Proceedings of 12th Symposium on String Processing and Information Retrieval, pp. 124-132, Buenos Aires, Argentina.
  • [31] Oliver Schabenberger and Francis J. Pierce (2002), Contemporary Statistical Models for the Plant and Soil Sciences, CRC Press, New York.
  • [32] Claude E. Shannon (1951), Prediction and entropy of printed English, Bell System Technical Journal, 30: 50-64.
  • [33] Ziqiang Shi (1989), The grammaticalization of the particle le in Mandarin Chinese, Language Variation and Change, 1: 99-114.
  • [34] Joseph. A. Smith and Colleen Kelly (2002), Stylistic constancy and change across literary corpora: Using measures of lexical richness to date works, Computers and the Humanities, 36: 411-430.
  • [35] Michael Spivey (2007), The Continuity of Mind, Oxford University Press, Oxford.
  • [36] Kamil Stachowski (2013), The influx rate of Turkic glosses in Hungarian and Polish post-mediaeval texts, in R. Köhler and G. Altmann, editors, Issues in Quantitative Linguistics, pp. 100-116, RAM-Verlag, Lüdenscheid.
  • [37] Sune V. Steffensen and Alwin Fill (2014), Ecolinguistics: the state of the art and future horizons, Language Sciences, 41 (6): 6-25.
  • [38] Benedikt Szmrecsanyi (2005), Language users as creatures of habit: A corpus-based analysis of persistence in spoken English, Corpus Linguistics and Linguistic Theory, 11: 113-150.
  • [39] Ans van Kemenade and Bettelou Los (2014), Using historical texts, in D. Sharma and R. Podesva, editors, Research Methods in Linguistics, pp. 216-231, Cambridge University Press, Cambridge.
  • [40] Marjolijn H. Verspoor and Heike Behrens (2011), Dynamic systems theory and a usage-based approach to second language development, in M. Verspoor, K. de Bot, and W. Lowie, editors, A Dynamic Approach to Second Language Development: Methods and Techniques, pp. 25-38, John Benjamins, Amsterdam.
  • [41] Marjolijn H. Verspoor, Kees de Bot, and Wander Lowie, editors (2011), A Dynamic Approach to Second Language Development: Methods and Techniques, John Benjamins, Amsterdam.
  • [42] William S-Y. Wang (1979), Language change: a lexical perspective, Annual Review of Anthropology, 8: 353-371.
  • [43] Jeffrey S. Wicken (1987), Entropy and information: suggestions for common language, Philosophy of Science, 54: 176-193.
  • [44] Dick R. Wittink (1988), The Application of Regression Analysis, Allyn and Bacon, Boston, MA.
Uwagi
Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-15c7a6df-90de-4294-86d4-ac424c64ca7a
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.