Simplicity and learning to distinguish arguments from modifiers

Bergen, Leon; Gibson, Edward; O'Donnell, Timothy J.

doi:10.15398/jlm.v10i2.263

Powiadomienia systemowe

Sesja wygasła!
Sesja wygasła!

Artykuł - szczegóły

Tytuł artykułu

Simplicity and learning to distinguish arguments from modifiers

Autorzy

Bergen Leon , Gibson Edward , O'Donnell Timothy J.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.15398/jlm.v10i2.263

Warianty tytułu

Języki publikacji

Abstrakty

We present a learnability analysis of the argument-modifier distinction, asking whether there is information in the distribution of English constituents that could allow learners to identify which constituents are arguments and which are modifiers. We first develop a general description of some of the ways in which arguments and modifiers differ in distribution. We then identify two models from the literature that can capture these differences, which we call the argument-only model and the argument-modifier model. We employ these models using a common learning framework based on two simplicity biases which tradeoff against one another. The first bias favors a small lexicon with highly reusable lexical items, and the second, opposing, bias favors simple derivations of individual forms – those using small numbers of lexical items. Our first empirical study shows that the argument-modifier model is able to recover the argument-modifier status of many individual constituents when evaluated against a gold standard. This provides evidence in favor of our general account of the distributional differences between arguments and modifiers. It also suggests a kind of lower bound on the amount of information that a suitably equipped learner could use to identify which phrases are arguments or modifiers. We then present a series of analyses investigating how and why the argument-modifier model is able to recover the argument-modifier status of some constituents. In particular, we show that the argumentmodifier model is able to provide a simpler description of the input corpus than the argument-only model, both in terms of lexicon size, and in terms of the complexity of individual derivations. Intuitively, the argument-modifier model is able to do this because it is able to ignore spurious modifier structure when learning the lexicon. These analyses further support our general account of the differences between arguments and modifiers, as well as our simplicity-based approach to learning.

Słowa kluczowe

linguistics machine learning computational linguistics syntax statistics

Wydawca

Instytut Podstaw Informatyki PAN

Czasopismo

Journal of Language Modelling

Rocznik

2022

Tom

Vol. 10, No. 2

Strony

241--286

Opis fizyczny

Bbibliogr. 105 poz., rys., tab., wykr.

Twórcy

autor

Bergen Leon

Department of Linguistics, University of California San Diego, San Diego, California

autor

Gibson Edward

gibson@mit.edu

Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, Massachusetts

https://orcid.org/0000-0002-5912-883X

autor

O'Donnell Timothy J.

timothy.odonnell@mcgill.ca

McGill University, Canada CIFAR AI Chair, Mila

https://orcid.org/0000-0002-5711-977X

Bibliografia

1. Leon BERGEN, Edward GIBSON, and Timothy J. O’DONNELL (2015), A learnability analysis of argument and modifier structure, lingbuzz, (lingbuzz/002502).
2. Robert C. BERWICK (1982), Locality principles and the acquisition of syntactic knowledge, Ph.D. thesis, Massachusetts Institute of Technology.
3. Robert C. BERWICK (1985), The acquisition of syntactic knowledge, The MIT Press, Cambridge, Massachusetts and London, England.
4. Rens BOD (1998), Beyond grammar: An experience-based theory of language, CSLI Publications, Palo Alto, CA.
5. Rens BOD, Remko SCHA, and Khalil SIMA’AN, editors (2003), Data-oriented parsing, CSLI, Palo Alto, CA.
6. Robert D. BORSLEY (1999), Syntactic theory: A unified approach, Edward Arnold, London, England.
7. Michael R. BRENT (1997), Toward a unified model of lexical acquisition and lexical access, Journal of Psycholinguistic Research, 26(3):363–375.
8. Michael R. BRENT (1999), An efficient, probabilistically sound algorithm for segmentation and word discovery, Machine Learning, 34:71–105.
9. Joan BRESNAN (2001), Lexical functional syntax, Wiley-Blackwell, Oxford, England.
10. Roger W. BROWN (1973), A first language: The early stages, Harvard University Press, Cambridge, MA.
11. Timothy A. CARTWRIGHT and Michael R. BRENT (1994), Segmenting speech without a lexicon: Evidence for a bootstrapping model of lexical acquisition, in Proceedings of the 16th Annual Meeting of the Cognitive Science Society.
12. David CHIANG (2000), Statistical parsing with an automatically-extracted tree adjoining grammar, in Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics.
13. David CHIANG and Daniel BIKEL (2002), Recovering latent information in treebanks, in Proceedings of COLING 2002.
14. Noam CHOMSKY (1951 [1979]), Morphophonemics of modern Hebrew, Garland Publishing, New York, NY.
15. Noam CHOMSKY (1955 [1975]), The logical structure of linguistic theory, Plenum Press, New York, NY.
16. Noam CHOMSKY (1964), Current issues in linguistic theory, Janua Linguarum: Studia Memoriae Nicolai van Wijk Dedicata, Mouton, The Hague, The Netherlands.
17. Noam CHOMSKY (1970), Remarks on nominalization, in Roderick JACOBS and Peter ROSENBAUM, editors, Readings in English Transformational Grammar, Ginn and Company, Waltham, MA.
18. Noam CHOMSKY (1993), A minimalist program for linguistic theory, in Kenneth L. HALE and Samuel Jay KEYSER, editors, The View from Building 20: Essays in Honor of Sylvain Bromberger, pp. 1–52, The MIT Press, Cambridge, Massachusetts and London, England.
19. Noam CHOMSKY (1995a), Bare phrase structure, in Gerth WEBELHUTH, editor, Government and Binding Theory and the Minimalist Program, pp. 383–349, Blackwell.
20. Noam CHOMSKY (1995b), The minimalist program, The MIT Press, Cambridge, MA.
21. Trevor COHN, Phil BLUNSOM, and Sharon GOLDWATER (2010), Inducing tree-substitution grammars, Journal of Machine Learning Research, 11:3053–3096.
22. Bernard COMRIE (1993), Argument structure, in Joachim JACOBS, Arnim VON STECHOW, Wolfgang STERNEFELD, and Theo VENNEMAN, editors, Syntax: An International Handbook, pp. 905–914, Walter de Gruyter, Berlin, Germany.
23. Denis CREISSELS (2014), Cross-linguistic variation in the treatment of beneficiaries and the argument vs. adjunct distinction, Linguistic Discovery, 12(2).
24. William CROFT (2001), Radical construction grammar: Syntactic theory in typological perspective, Oxford University Press, Oxford, England.
25. Peter CULICOVER and Ray JACKENDOFF (2005), Simpler syntax, Oxford University Press, Oxford, England.
26. Carl DE MARCKEN (1996a), The unsupervised acquisition of a lexicon from continuous speech, Technical Report AI-memo-1558, CBCL-memo-129, Massachusetts Institute of Technology – Artificial Intelligence Laboratory.
27. Carl DE MARCKEN (1996b), Unsupervised language acquisition, Ph.D. thesis, Massachusetts Institute of Technology.
28. Jacob FELDMAN (2000), Minimization of Boolean complexity in human concept learning, Nature, 407(6804):630–633.
29. Jenny Rose FINKEL, Trond GRENAGER, and Christopher D. MANNING (2007), The infinite tree, in Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics.
30. Diana FORKER (2014), A canonical approach to the argument/adjunct distinction, Linguistic Discovery, 12(2).
31. L. T. F. GAMUT (1991), Logic, language, and meaning volume II: Intensional logic and logical grammar, University of Chicago Press, Chicago, IL.
32. Gerald GAZDAR, Ewan KLEIN, Geoffrey K. PULLUM, and Ivan A. SAG (1985), Generalized phrase structure grammar, Harvard University Press, Cambridge, MA.
33. John Anton GOLDSMITH (2011), The evaluation metric in generative grammar, in Proceedings of the 50th Anniversay Celebration of the MIT Department of Linguistics.
34. Sharon GOLDWATER (2006), Nonparametric bayesian models of lexical acquisition, Ph.D. thesis, Brown University.
35. Sharon GOLDWATER, Thomas L. GRIFFITHS, and Mark JOHNSON (2006), Interpolating between types and tokens by estimating power-law generators, in Advances in Neural Information Processing Systems 18.
36. Noah D. GOODMAN, Joshua B. TENENBAUM, Jacob FELDMAN, and Thomas L. GRIFFITHS (2008), A rational analysis of rule-based concept learning, Cognitive Science, 32(1):108–154.
37. Peter D. GRÜNWALD (2007), The minimum description length principle, The MIT Press, Cambridge, MA.
38. Liliane HAEGEMAN (1994), Introduction to government and binding theory, Blackwell, Oxford, England.
39. Martin HASPELMATH (2014), Arguments and adjuncts as language-particular syntactic categories and as comparative concepts, Linguistic Discovery, 12(2).
40. Irene HEIM and Angelika KRATZER (1998), Semantics in generative grammar, Blackwell Publishing, Malden, MA.
41. Norbert HORNSTEIN and David W. LIGHTFOOT (1981), Introduction to explanation in linguistics: The logical problem of language acquisition, Addison Wesley Longman, Upper Saddle River, NJ.
42. Anne S. HSU and Nick CHATER (2010), The logical problem of language acquisition goes probabilistic: No negative evidence as a window on language acquisition, Cognitive Science, 34:972–1016.
43. Anne S. HSU, Nick CHATER, and Paul M. B. VITÁNYI (2011), The probabilistic analysis of language acquisition: Theoretical, computational, and experimental analysis, Cognition, 120:380–390.
44. Anne S. HSU, Nick CHATER, and Paul M. B. VITÁNYI (2013), Language learning for positive evidence reconsidered: A simplicity-based approach, Topics in Cognitive Science, 5:35–55.
45. Rodney HUDDLESTON and Geoffrey K. PULLUM (2002), The cambridge grammar of English language, Cambridge University Press, Cambridge, England.
46. Ray JACKENDOFF (2002), Foundations of language, Oxford University Press, New York, NY.
47. David E. JOHNSON and Paul M. POSTAL (1980), Arc pair grammar, Princeton University Press, Princeton, NJ.
48. Mark JOHNSON, Thomas L. GRIFFITHS, and Sharon GOLDWATER (2007), Adaptor Grammars: A framework for specifying compositional nonparametric Bayesian models, in Advances in Neural Information Processing Systems 19, MIT Press, Cambridge, MA.
49. Aravind K. JOSHI and Leon S. LEVY (1975), Tree adjunct grammars, Journal of Computer and System Sciences, 10:136–163.
50. Jean-Pierre KOENIG, Gail MAUNER, and Breton BIENVENUE (2003), Arguments for adjuncts, Cognition, 89:67–103.
51. Paul R. KROEGER (2004), Analyzing syntax: A lexical-functional approach, Cambridge University Press, Cambridge, England.
52. Ming LI and Paul M. B. VITÁNYI (2008), An introduction to Kolmogorov complexity and its applications, Springer, Berlin, Germany, third edition.
53. Percy LIANG, Slav PETROV, Michael I. JORDAN, and Dan KLEIN (2007), The infinite PCFG using hierarchical Dirichlet processes, in Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 688–697.
54. Brian MACWHINNEY (2000), The childes project: Tools for analyzing talk.,
55. Lawrence Erlbaum Associates, Mahwah, NJ. Alec MARANTZ (2013), Verbal argument structure: Events and participants, Lingua, 130:152–168.
56. Mitchell P. MARCUS, Beatrice SANTORINI, Mary Ann MARCINKIEWICZ, and Ann TAYLOR (1999), Treebank-3, Technical report, Linguistic Data Consortium, Philadelphia.
57. Peter H. MATTHEWS (1981), Syntax, Cambridge University Press, Cambridge, England.
58. Sally MCCONNELL-GINET and Gennaro CHIERCHIA (2000), Meaning and grammar: An introduction to semantics, MIT Press, Cambridge, MA.
59. Igor MEL’ČUK (1988), Dependency syntax : Theory and practice, The SUNY Press, Albany, NY.
60. Michael MOORTGAT (1997), Categorial type logics, in Handbook of Logic and Language, pp. 93–177, Elsevier.
61. Timothy J. O’DONNELL (2011), Productivity and reuse in language, Ph.D. thesis, Harvard University, Cambridge, MA.
62. Timothy J. O’DONNELL (2015), Productivity and reuse in language: A theory of linguistic computation and storage, The MIT Press, Cambridge, MA.
63. Timothy J. O’DONNELL, Jesse SNEDEKER, Joshua B. TENENBAUM, and Noah D. GOODMAN (2011), Productivity and reuse in language, in Proceedings of the 33rd Annual Conference of the Cognitive Science Society, Boston, MA.
64. Martha PALMER, P. KINGSBURY, and Daniel GILDEA (2005), The proposition bank: An annotated corpus of semantic roles, Computational Linguistics, 31(1):71–106.
65. Lisa PEARL and Sharon GOLDWATER (2016), Statistical learning, inductive bias, and Bayesian inference in language acquisition, in Jeffrey LIDZ, William SNYDER, and Joe PATER, editors, The Oxford Handbook of Developmental Linguistics, Oxford University Press, Oxford, England.
66. Lisa PEARL and Jon SPROUSE (2013), Syntactic islands and learning biases: Combining experimental syntax and computational modeling to investigate the language acquisition problem, Language Acquisition, 20:23–68.
67. Amy PERFORS, Joshua B. TENENBAUM, and Terry REGIER (2011), The learnability of abstract syntactic principles, Cognition, 118(3):306–338.
68. Lawrence PHILLIPS and Lisa PEARL (2014), The utility of cognitive plausibility in language acquisition modeling: Evidence from word segmentation, manuscript.
69. Steven Thomas PIANTADOSI (2011), Learning and the language of thought, Ph.D. thesis, Massachusetts Institute of Technology.
70. Steven Thomas PIANTADOSI (2021), The computational origin of representation, Minds and Machines, 31:1–58.
71. Jim PITMAN and Marc YOR (1995), The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator, Technical report, Department of Statistics University of California, Berkeley.
72. Carl POLLARD and Ivan A. SAG (1994), Head-driven phrase structure grammar, University of Chicago Press, Chicago, IL.
73. Matt POST and Daniel GILDEA (2009), Bayesian learning of a tree substitution grammar, in Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP.
74. Matt POST and Daniel GILDEA (2013), Bayesian tree substitution grammars as a usage-based approach, Language and Speech, 56(3):291–308.
75. Adam PRZEPIÓRKOWSKI (1999a), Case assignment and the complement/adjunct dichotomy, Ph.D. thesis, Neuphilologischen Fakultät der Universität Tübingen, Tübingen.
76. Adam PRZEPIÓRKOWSKI (1999b), On case assignment and “adjuncts as complements”, in Gerth WEBELHUTH, Jean-Pierre KOENIG, and A. KATHOL, editors, Lexical and Constructional Aspects of Linguistic Explanation, pp. 223–245, CSLI Publications.
77. Adam PRZEPIÓRKOWSKI (2017), Hierarchical lexicon and the argument/adjunct distinction, in Proceedings of the Lexical Functional Grammar 2017 (LFG’17) Conference, University of Konstanz.
78. Andrew RADFORD (1988), Transformational grammar: A first course, Cambridge University Press, Cambridge, England.
79. György RÁKOSI (2006), Dative experiencer predicates in Hungarian, Ph.D. thesis, Universiteit Utrecht.
80. Owen RAMBOW, K. VIJAY-SHANKER, and David WEIR (1995), D-tree grammars, in Proceedings of the 33rd annual meeting on Association for Computational Linguistics, Association for Computational Linguistics.
81. Ezer RASIN and Roni KATZIR (2016), On evaluation metrics in optimality theory, Linguistic Inquiry, 47(2):235–282.
82. Jorma RISSANEN (1978), Modeling by shortest data description, Automaticata, 14(5):465–471.
83. Ivan A. SAG (2012), Sign-based Construction Grammar: An informal synopsis, in Hans BOAS and Ivan A. SAG, editors, Sign-Based Construction Grammar, pp. 101–107, CSLI Publications, Palo Alto, CA.
84. Remko SCHA (1990), Taaltheorie en taaltechnologie; competence en performance, in R. DE KORT and G.L.J. LEERDAM, editors, Computertoepassingen in de Neerlandistiek, pp. 7–22, Landelijke Vereniging van Neerland.
85. Remko SCHA (1992), Virtuele grammatica’s en creatieve algoritmes, Gramma/TTT, 1(1):57–77.
86. Yves SCHABES and Stuart M. SHIEBER (1994), An alternative conception of tree-adjoining derivation, Computational Linguistics, 20(1):91–124.
87. Yves SCHABES and Richard C. WATERS (1995), Tree insertion grammar: A cubic-time parsable formalism that lexicalizes context-free grammar without changing the trees produced, Computational Linguistics, 21(4):479–513.
88. Carson T. SCHÜTZE (1995), PP attachment and argumenthood, Technical report, Papers on language processing and acquisition, MIT working papers in linguistics, Cambridge, Ma.
89. Carson T. SCHÜTZE and Edward GIBSON (1999), Argumenthood and English prepositional phrase attachment, Journal of Memory and Language, 40:409–431.
90. Ray SOLOMONOFF (1978), Complexity-based induction systems: comparisons and convergence theorems, IEEE Transactions on Information Theory, 24(4):422–432.
91. Ray J. SOLOMONOFF (1964a), A formal theory of inductive inference. Part I, Information and Control, 7(1):1–22.
92. Ray J. SOLOMONOFF (1964b), A formal theory of inductive inference. Part II, Information and Control, 7(2):224–254.
93. Edward P. STABLER (1997), Derivational minimalism, in Logical Aspects of Computational Linguistics, Springer, Berlin, Germany.
94. Mark STEEDMAN (2000), The syntactic process, MIT Press, Cambridge, MA.
95. Andreas STOLCKE and Stephen OMOHUNDRO (1994), Inducing probabilistic grammars by Bayesian model merging, in Proceedings of the International Conference on Grammatical Inference.
96. Maggie TALLERMAN (2015), Understanding syntax, Routledge, London, England, fourth edition.
97. Yee Whye TEH (2006), A Bayesian interpretation of interpolated Kneser-Ney, Technical Report TRA2/06, National University of Singapore, School of Computing.
98. Damon TUTUNJIAN and Julie E. BOLAND (2008), Do we need a distinction between arguments and adjuncts? Evidence from psycholinguistic studies of comprehension, Language and Linguistics Compass, 2(4):631–646.
99. Heinz VATER (1978), On the possibility of distinguishing between complements and adjuncts, in Valence, semantic case and grammatical relations, pp. 21–45, John Benjamins.
100. Søren WICHMANN (2014), Arguments and adjuncts cross-linguistically: A brief introduction, Linguistic Discovery, 12(2).
101. J. Gerard WOLFF (1977), The discovery of segments in natural language, British Journal of Psychology, 68:97–106.
102. J. Gerard WOLFF (1980), Language acquisition and the discovery of phrase structure, Language and Speech, 23(3):255–269.
103. J. Gerard WOLFF (1982), Language acquisition, data compression, and generalisation, Language and Communication, 2(1):57–89.
104. Yuan YANG and Steven Thomas PIANTADOSI (2022), One model for the learning of language, Proceedings of the National Academy of Sciences, 119(5).
105. Arnold M. ZWICKY (1993), Heads, bases, and functors, in Greville G. CORBETT, Norman M. FRASER, and Scott MCGLASHAN, editors, Heads in Grammatical Theory, pp. 292–315, Cambridge University Press, Cambridge, England.

Uwagi

Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-f4548251-8c7c-4617-8dc6-f88ca54b6744