Learning reduplication with a neural network that lacks explicit variables

Prickett, Brandon; Traylor, Aaron; Pater, Joe

doi:10.15398/jlm.v10i1.274

Artykuł - szczegóły

Tytuł artykułu

Learning reduplication with a neural network that lacks explicit variables

Autorzy

Prickett Brandon , Traylor Aaron , Pater Joe

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.15398/jlm.v10i1.274

Warianty tytułu

Języki publikacji

Abstrakty

Reduplicative linguistic patterns have been used as evidence for explicit algebraic variables in models of cognition.1 Here, we show that a variable-free neural network can model these patterns in a way that predicts observed human behavior. Specifically, we successfully simulate the three experiments presented by Marcus et al. (1999), as well as Endress et al.’s (2007) partial replication of one of those experiments. We then explore the model’s ability to generalize reduplicative mappings to different kinds of novel inputs. Using Berent’s (2013) scopes of generalization as a metric, we claim that the model matches the scope of generalization that has been observed in humans. We argue that these results challenge past claims about the necessity of symbolic variables in models of cognition.

Słowa kluczowe

neural networks reduplication symbolic computation connectionism generalization phonology

Wydawca

Instytut Podstaw Informatyki PAN

Czasopismo

Journal of Language Modelling

Rocznik

2022

Tom

Vol. 10, No.1

Strony

1--38

Opis fizyczny

Bibliogr. 68 poz., rys., tab., wykr.

Twórcy

autor

Prickett Brandon

bprickett@umass.edu

Department of Linguistics, University of Massachusetts Amherst

https://orcid.org/+0000-0001-9217-2130

autor

Traylor Aaron

aaron_traylor@brown.edu

Department of Computer Science, Brown University

https://orcid.org/+0000-0002-0975-1914+

autor

Pater Joe

pater@linguist.umass.edu

Department of Linguistics, University of Massachusetts Amherst

https://orcid.org/0000-0002-4784-3799+

Bibliografia

1. Adam ALBRIGHT and Bruce HAYES (2003), Rules vs. analogy in English past tenses: A computational/experimental study, Cognition, 90(2):119-161.
2. Raquel G. ALHAMA and Willem ZUIDEMA (2018), Pre-Wiring and pre-training: What does a neural network need to learn truly general identity rules?, Journal of Artificial Intelligence Research, 61:927-946.
3. Gerry T.M. ALTMANN (2002), Learning and development in neural networks – the importance of prior experience, Cognition, 85(2):B43-B50.
4. Dzmitry BAHDANAU, Kyunghyun CHO, and Yoshua BENGIO (2015), Neural machine translation by jointly learning to align and translate, in Yoshua BENGIO and Yann LECUN, editors, 3rd International Conference on Learning Representations, Conference Track Proceedings.
5. Gašper BEGUŠ (2021), Identity-based patterns in deep Convolutional Networks: Generative Adversarial Phonology and reduplication, Transactions of the Association for Computational Linguistics, 9:1180-1196.
6. Yoshua BENGIO, Patrice SIMARD, and Paolo FRASCONI (1994), Learning long-term dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks, 5(2):157-166.
7. Iris BERENT (2013), The phonological mind, Trends in Cognitive Sciences, 17(7):319-327.
8. Iris BERENT, Outi BAT-EL, Diane BRENTARI, Amanda DUPUIS, and Vered VAKNIN-NUSBAUM (2016), The double identity of linguistic doubling, Proceedings of the National Academy of Sciences, 113(48):13702-13707.
9. Iris BERENT, Amanda DUPUIS, and Diane BRENTARI (2014), Phonological reduplication in sign language: Rules rule, Frontiers in Psychology, 5(560):1-15.
10. Iris BERENT, Gary MARCUS, Joseph SHIMRON, and Adamantios I. GAFOS (2002), The scope of linguistic generalizations: Evidence from Hebrew word formation, Cognition, 83(2):113-139.
11. Iris BERENT and Joseph SHIMRON (1997), The representation of Hebrew words: Evidence from the obligatory contour principle, Cognition, 64(1):39-72.
12. François CHOLLET (2015), Keras, https://github.com/keras-team/keras.
13. Noam CHOMSKY and Morris HALLE (1968), The sound pattern of English, Harper & Row.
14. Morten H. CHRISTIANSEN and Suzanne L. CURTIN (1999), The power of statistical learning: No need for algebraic rules, in Martin HAHN and Scott C. STONESS, editors, Proceedings of the 21st Annual Conference of the Cognitive Science Society, pp. 114-119, Routledge
15. Alexander CLARK and Ryo YOSHINAKA (2014), Distributional learning of parallel multiple context-free grammars, Machine Learning, 96(1–2):5-31.
16. David Paul CORINA (1991), Towards an understanding of the syllable: evidence from linguistic, psychological, and connectionist, PhD Thesis, University of California, San Diego.
17. Maria CORKERY, Yevgen MATUSEVYCH, and Sharon GOLDWATER (2019), Are we there yet? Encoder-decoder neural networks as cognitive models of English past tense inflection, in Anna KORHONEN, David TRAUM, and Luís MÀRQUEZ, editors, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 3868-3877.
18. Ryan COTTERELL, Christo KIROV, John SYLAK-GLASSMAN, David YAROWSKY, Jason EISNER, and Mans HULDEN (2016), The SIGMORPHON 2016 shared task—morphological reinflection, in Micha ELSNER and Sandra KUEBLER, editors, Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pp. 10-22.
19. Verna DANKERS, Anna LANGEDIJK, Kate MCCURDY, Adina WILLIAMS, and Dieuwke HUPKES (2021), Generalising to German plural noun classes, from the perspective of a Recurrent Neural Network, Conference on Computational Natural Language Learning, https://aclanthology.org/2021.conll-1.8.
20. Jacob DEVLIN, Ming-Wei CHANG, Kenton LEE, and Kristina TOUTANOVA (2019), BERT: Pre-training of deep bidirectional transformers for language understanding, in Jill BURSTEIN, Christy DORAN, and Thamar SOLORIO, editors, Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4171-4186, https://aclanthology.org/N19-1423.
21. Hossep DOLATIAN and Jeffrey HEINZ (2020), Computing and classifying reduplication with 2-way finite-state transducers, Journal of Language Modelling, 8(1):179-250.
22. Leonidas DOUMAS and John E. HUMMEL (2010), A computational account of the development of the generalization of shape information, Cognitive Science, 34(4):698-712.
23. Jeffrey L. ELMAN (1990), Finding structure in time, Cognitive Science, 14(2):179-211.
24. Ansgar D. ENDRESS, Ghislaine DEHAENE-LAMBERTZ, and Jacques MEHLER (2007), Perceptual constraints and the learnability of simple grammars, Cognition, 105(3):577-614.
25. Charles A. FERGUSON (1964), Baby talk in six languages, American Anthropologist, 66(6_PART2):103-114. Adamantios
I. GAFOS (1999), The articulatory basis of locality in phonology, Taylor & Francis
26. Michael GASSER (1993), Learning words in time: Towards a modular connectionist account of the acquisition of receptive morphology, Indiana University, Department of Computer Science.
27. Jila GHOMESHI, Ray JACKENDOFF, Nicole ROSEN, and Kevin RUSSELL (2004), Contrastive Focus Reduplication in English (The Salad-Salad Paper), Natural Language & Linguistic Theory, 22(2):307-357, ISSN 0167-806X, https://www.jstor.org/stable/4048061.
28. Coleman HALEY and Colin WILSON (2021), Deep neural networks easily learn unnatural infixation and reduplication patterns, Proceedings of the Society for Computation in Linguistics (SCiL), pp. 427-433.
29. Silke HAMANN (2010), Phonetics-phonology interface, in Nancy C. KULA, Bert BOTMA, and Kuniya NASUKAWA, editors, The Bloomsbury Companion to Phonology, Bloomsbury Companions, Bloomsbury.
30. Adriana HANULIKOVA and Andrea WEBER (2010), Production of English interdental fricatives by Dutch, German, and English speakers, in Magdalena WREMBEL, Malgorzata KUL, and Katarzyna DZIUBALSKA-KOLACZYK, editors, New Sounds 2010: Sixth International Symposium on the Acquisition of Second Language Speech, pp. 173-178, Peter Lang Verlag.
31. Bruce HAYES (2011), Introductory phonology, John Wiley & Sons. Sepp HOCHREITER, Yoshua BENGIO, Paolo FRASCONI, and Jürgen SCHMIDHUBER (2001), Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, A field guide to dynamical recurrent neural networks. IEEE Press.
32. Michael I. JORDAN (1986), Serial order: A parallel distributed processing approach, Technical report, University of California, San Diego.
33. Christo KIROV and Ryan COTTERELL (2018), Recurrent Neural Networks in linguistic theory: Revisiting Pinker & Prince (1988) and the past tense debate, Transactions of the Association for Computational Linguistics, 6:651-665.
34. Kris KORREL, Dieuwke HUPKES, Verna DANKERS, and Elia BRUNI (2019), Transcoding compositionally: Using attention to find more generalizable solutions, in Proceedings of the 2019 ACL Workshop Blackbox NLP: Analyzing and Interpreting Neural Networks for NLP, pp. 1-11, Association for Computational Linguistics, doi:10.18653/v1/W19-4801.
35. Ludmila I. KUNCHEVA (2014), Combining pattern classifiers: methods and algorithms, John Wiley & Sons.
36. Brenden M. LAKE and Marco BARONI (2017), Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks, in Jennifer DY and Andreas KRAUSE, editors, Proceedings of the 35th International Conference on Machine Learning.
37. Jean-Yves LE BOUDEC (2011), Performance evaluation of computer and communication systems, Epfl Press.
38. Omer LEVY, Kenton LEE, Nicholas FITZGERALD, and Luke ZETTLEMOYER (2018), Long Short-Term Memory as a dynamically computed element-wise weighted sum, Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 732-739.
39. Tal LINZEN, Emmanuel DUPOUX, and Yoav GOLDBERG (2016), Assessing the ability of LSTMs to learn syntax-sensitive dependencies, Transactions of the Association for Computational Linguistics, 4:521-535.
40. Gary MARCUS (1998), Rethinking eliminative connectionism, Cognitive Psychology, 37(3):243-282.
41. Gary MARCUS (1999), Do infants learn grammar with algebra or statistics? Response, Science, 284(5413):436-437.
42. Gary MARCUS (2001), The algebraic mind, Cambridge, MA: MIT Press. Gary MARCUS, Sugumaran VIJAYAN, S. Bandi RAO, and Peter M. VISHTON (1999), Rule learning by seven-month-old infants, Science, 283(5398):77-80.
43. Reiko MAZUKA, Tadahisa KONDO, and Akiko HAYASHI (2008), Japanese mothers’ use of specialized vocabulary in infant-directed speech: infant-directed vocabulary in Japanese, in The origins of language, pp. 39-58, Springer.
44. R. Thomas MCCOY, Erin GRANT, Paul SMOLENSKY, Thomas L GRIFFITHS, and Tal LINZEN (2020), Universal linguistic inductive biases via meta-learning, Proceedings of the 42nd Annual Conference of the Cognitive Science Society.
45. Richard Thomas MCCOY, Robert FRANK, and Tal LINZEN (2018), Revisiting the poverty of the stimulus: hierarchical generalization without a hierarchical bias in recurrent neural networks, in Chuck KALISH, Martina RAU, Jerry ZHU, and Timothy ROGERS, editors, Proceedings of Cog Sci 2018, pp. 2096-2101.
46. Elliott MORETON, Brandon PRICKETT, Katya PERTSOVA, Josh FENNELL, Joe PATER, and Lisa SANDERS (2021), Learning repetition, but not syllable reversal, in Ryan BENNETT, Richard BIBBS, Mykel L. BRINKERHOFF, Max J. KAPLAN, Stephanie RICH, Amanda RYSLING, Nicholas VAN HANDEL, and Maya Wax CAVALLARO, editors, Proceedings of the Annual Meetings on Phonology.
47. Breyne Arlene MOSKOWITZ (1975), The acquisition of fricatives: A study in phonetics and phonology, Journal of Phonetics, 3(3):141-150.
48. Max NELSON, Hossep DOLATIAN, Jonathan RAWSKI, and Brandon PRICKETT (2020), Probing RNN Encoder-Decoder generalization of subregular functions using reduplication, Proceedings of the Society for Computation in Linguistics (SCiL), pp. 31-42.
49. Andrew NEVINS and Bert VAUX (2003), Metalinguistic, shmetalinguistic: The phonology of shmreduplication, in J. CIHLAR, A. FRANKLIN, D. KAISER, and J. KIMBARA, editors, Proceedings from the 39th Annual Meeting of the Chicago Linguistic Society, pp. 702-721, Chicago Linguistic Society.
50. Steven PINKER and Alan PRINCE (1988), On language and connectionism: Analysis of a parallel distributed processing model of language acquisition, Cognition, 28(1):73-193.
51. Brandon PRICKETT (2019), Learning biases in opaque interactions, Phonology, 36(4):627-653, doi:10.1017/S0952675719000320.
52. Hugh RABAGLIATI, Brock FERGUSON, and Casey LEW-WILLIAMS (2019), The profile of abstract rule learning in infancy: Meta-analytic and experimental evidence, Developmental Science, 22:1-18.
53. Fariz RAHMAN (2016), seq2seq: Sequence to sequence learning with Keras, https://github.com/farizrahman4u/seq2seq.
54. D. Victoria RAU, Hui-Huan Ann CHANG, and Elaine E. TARONE (2009), Think or sink: Chinese learners’ acquisition of the English voiceless interdental fricative, Language Learning, 59(3):581-621.
55. D.E. RUMELHART and J.L. MCCLELLAND (1986), On learning the past tenses of English verbs, in J.L. MCCLELLAND and D.E. RUMELHART, editors, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, volume 2: Psychological and Biological Models, pp. 216-271, The MIT Press.
56. Mark S. SEIDENBERG and Jeff L. ELMAN (1999), Do infants learn grammar with algebra or statistics?, Science, 284(5413):433.
57. Thomas R. SHULTZ and Alan C. BALE (2001), Neural network simulation of infant familiarization to artificial sentences: Rule-like behavior without explicit rules and variables, Infancy, 2(4):501-536.
58. Mary H. SKEEL (1969), Perceptual confusions among fricatives in preschool children., Technical report, The University of Wisconsin, https://files.eric.ed.gov/fulltext/ED036789.pdf.
59. Nitish SRIVASTAVA, Geoffrey HINTON, Alex KRIZHEVSKY, Ilya SUTSKEVER, and Ruslan SALAKHUTDINOV (2014), Dropout: A simple way to prevent neural networks from overfitting, The Journal of Machine Learning Research, 15(1):1929-1958.
60. Pavol ŠTEKAUER, Salvador VALERA, and Lívia KÖRTVÉLYESSY (2012), Word-formation in the world’s languages: a typological survey, Cambridge University Press.
61. Joseph Paul STEMBERGER and Marshall LEWIS (1986), Reduplication in Ewe: Morphological accommodation to phonological errors, Phonology, 3:151-160.
62. Ilya SUTSKEVER, Oriol VINYALS, and Quoc V. LE (2014), Sequence to sequence learning with neural networks, in Advances in neural information processing systems, pp. 3104-3112.
63. Brandon Prickett, Aaron Traylor, Joe Pater Tijmen TIELEMAN and Geoffrey HINTON (2012), Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude, COURSERA: Neural Networks for Machine Learning, 4(2):26-31.
64. Guillermo VALLE-PEREZ, Chico Q CAMARGO, and Ard A. LOUIS (2018), Deep learning generalizes because the parameter-function map is biased towards simple functions, in Yoshua BENGIO and Yann LECUN, editors, Proceedings of the 6th International Conference on Learning Representations.
65. Rachelle WAKSLER (1999), Cross-linguistic evidence for morphological representation in the mental lexicon, Brain and Language, 68(1–2):68-74.
66. Yang WANG (2021), Recognizing reduplicated forms: Finite-state buffered machines, in Garrett NICOLAI, Kyle GORMAN, and Ryan COTTERELL, editors, Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pp. 177-187.
67. Janet F. WERKER and Richard C. TEES (1983), Developmental changes across childhood in the perception of non-native speech sounds, Canadian Journal of Psychology/Revue Canadienne de Psychologie, 37(2):278-286.
68. Colin WILSON (2019), Re (current) reduplication: Interpretable neural network models of morphological copying, Proceedings of the Society for Computation in Linguistics (SCiL), 2:379-380.

Uwagi

Opracowanie rekordu ze środków MEiN, umowa nr SONP/SP/546092/2022 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2022-2023).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-c07d7eab-8679-4378-8bdc-f41ccb0fac6c