Modelling a subregular bias in phonological learning with Recurrent Neural Networks

Prickett, Brandon

doi:10.15398/jlm.v9i1.251

Artykuł - szczegóły

Tytuł artykułu

Modelling a subregular bias in phonological learning with Recurrent Neural Networks

Autorzy

Prickett Brandon

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.15398/jlm.v9i1.251

Warianty tytułu

Języki publikacji

Abstrakty

A number of experiments have demonstrated what seems to be a bias in human phonological learning for patterns that are simpler according to Formal Language Theory (Finley and Badecker 2008; Lai 2015; Avcu 2018). This paper demonstrates that a sequence-to-sequence neural network (Sutskever et al. 2014), which has no such restriction explicitly built into its architecture, can successfully capture this bias. These results suggest that a bias for patterns that are simpler according to Formal Language Theory may not need to be explicitly incorporated into models of phonological learning.

Słowa kluczowe

neural networks learning bias formal language theory phonology

Wydawca

Instytut Podstaw Informatyki PAN

Czasopismo

Journal of Language Modelling

Rocznik

2021

Tom

Vol. 9, No. 1

Strony

67--96

Opis fizyczny

Bibliogr. 70 poz., rys., tab., wykr.

Twórcy

autor

Prickett Brandon

bprickett@umass.edu

University of Massachusetts Amherst

Bibliografia

[1]. John ALDERETE and Paul TUPPER (2018), Connectionist Approaches to Generative Phonology, The Routledge Handbook of Phonological Theory. Routledge.
[2]. Afra ALISHAHI, Grzegorz CHRUPAŁA, and Tal LINZEN (2019), Analyzing and Interpreting Neural Networks for NLP: A Report on the First BlackboxNLP Workshop, arXiv preprint arXiv:1904.04063.
[3]. Enes AVCU (2018), Experimental Investigation of the Subregular Hypothesis, in Proceedings of the 35th West Coast Conference on Formal Linguistics, pp. 77-86.
[4]. Dzmitry BAHDANAU, Kyunghyun CHO, and Yoshua BENGIO (2015), Neural Machine Translation by Jointly Learning to Align and Translate, in 3rd International Conference on Learning Representations, Conference Track Proceedings.
[5]. Eric BAKOVIC (1999), Assimilation to the Unmarked, University of Pennsylvania Working Papers in Linguistics, 6(1):2.
[6]. Eric BAKOVIC (2000), Harmony, dominance and control, PhD Thesis, Rutgers University.
[7]. Peter W BATTAGLIA, Jessica B HAMRICK, Victor BAPST, Alvaro SANCHEZ-GONZALEZ, Vinicius ZAMBALDI, Mateusz MALINOWSKI, Andrea TACCHETTI, David RAPOSO, Adam SANTORO, Ryan FAULKNER, et al. (2018), Relational Inductive Biases, Deep Learning, and Graph Networks, arXiv preprint arXiv:1806.01261.
[8]. Yoshua BENGIO, Patrice SIMARD, and Paolo FRASCONI (1994), Learning Long-term Dependencies with Gradient Descent is Difficult, IEEE Transactions on Neural Networks, 5(2):157-166.
[9]. Wm G BENNETT (2015), The phonology of Consonants: Harmony, Dissimilation and Correspondence, Cambridge University Press.
[10]. Phillip BURNESS and Kevin MCMULLIN (2019), Efficient Learning of Output Tier-based Strictly 2-local Functions, in Proceedings of the 16th Meeting on the Mathematics of Language, pp. 78-90.
[11]. Jane CHANDLEE (2014), Strictly Local Phonological Processes, PhD Thesis, University of Delaware.
[12]. Jane CHANDLEE, Rémi EYRAUD, and Jeffrey HEINZ (2015), Output Strictly Local Functions, in 14th Meeting on the Mathematics of Language, pp. 112-125.
[13]. Jane CHANDLEE, Rémi EYRAUD, and Jeffrey HEINZ (2014), Learning Strictly Local Subsequential Functions, Transactions of the Association for Computational Linguistics, 2:491-504.
[14]. Kyunghyun CHO, Bart VAN MERRIËNBOER, Dzmitry BAHDANAU, and Yoshua BENGIO (2014), On the Properties of Neural Machine Translation: Encoder-Decoder Approaches, in Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 103-111,
[15]. Association for Computational Linguistics. Noam CHOMSKY (1956), Three Models for the Description of Language, IRE Transactions on Information Theory, 2(3):113-124.
[16]. Noam CHOMSKY and Morris HALLE (1968), The Sound Pattern of English, Harper & Row.
[17]. Amanda DOUCETTE (2017), Inherent Biases of Recurrent Neural Networks for Phonological Assimilation and Dissimilation, in Proceedings of the 7th Workshop on Cognitive Modeling and Computational Linguistics, pp. 35-40.
[18]. Jeffrey L. ELMAN (1990), Finding Structure in Time, Cognitive science, 14(2):179-211.
[19]. Igor FARKAŠ (2008), Learning Nonadjacent Dependencies with a Recurrent Neural Network, in International Conference on Neural Information Processing, pp. 292-299, Springer.
[20]. Sara FINLEY (2017), Locality and Harmony: Perspectives from Artificial Grammar Learning, Language and Linguistics Compass, 11(1):1-16.
[21]. Sara FINLEY and William BADECKER (2008), Analytic biases for vowel harmony languages, in West Coast Conference on Formal Linguistics, volume 27, pp. 168-176.
[22]. Michael GASSER (1993), Learning Words in Time: Towards a Modular Connectionist Account of the Acquisition of Receptive Morphology, Indiana University, Department of Computer Science.
[23]. Michael GASSER and Chan-Do LEE (1992), Networks that Learn about Phonological Feature Persistence, in Connectionist Natural Language Processing, pp. 349-362, Springer.
[24]. Thomas GRAF and Connor MAYER (2018), Sanskrit n-Retroflexion is Input-Output Tier-Based Strictly Local, in Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology, pp. 151-160.
[25]. David Marvin GREEN and John A. SWETS (1966), Signal Detection Theory and Psychophysics, volume 1, Wiley.
[26]. Mary HARE (1990), The Role of Trigger-target Similarity in the Vowel Harmony Process, in Annual Meeting of the Berkeley Linguistics Society, volume 16, pp. 140-152.
[27]. Jeffrey HEINZ (2010), Learning Long-distance Phonotactics, Linguistic Inquiry, 41(4):623-661.
[28]. Jeffrey HEINZ (2018), The computational nature of phonological generalizations, in Phonological typology, pp. 126-195, De Gruyter Mouton.
[29]. Jeffrey HEINZ and William IDSARDI (2011), Sentence and Word Complexity, Science, 333(6040):295-297.
[30]. Jeffrey HEINZ and Regine LAI (2013), Vowel Harmony and Subsequentiality, in Proceedings of the 13th Meeting on the Mathematics of Language (MoL 13), pp. 52-63.
[31]. Jeffrey HEINZ, Chetan RAWAL, and Herbert G TANNER (2011), Tier-based Strictly Local Constraints for Phonology, in Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers-Volume 2, pp. 58-64, Association for Computational Linguistics.
[32]. Adam JARDINE and Jeffrey HEINZ (2016), Learning Tier-based Strictly 2-local Languages, Transactions of the Association for Computational Linguistics, 4:87-98.
[33]. C. Douglas JOHNSON (1972), Formal Aspects of Phonological Description, Mouton & Co. N.V.
[34]. Michael I. JORDAN (1986), Serial Order: A Parallel Distributed Processing Approach., Technical report, University of California, San Diego.
[35]. Ronald M. KAPLAN and Martin KAY (1994), Regular Models of Phonological Rule Systems, Computational Linguistics, 20(3):331-378.
[36]. Diederik P. KINGMA and Jimmy BA (2015), Adam: A Method for Stochastic Optimization, in 3rd International Conference on Learning Representations, Conference Track Proceedings.
[37]. Christo KIROV and Ryan COTTERELL (2018), Recurrent Neural Networks in Linguistic Theory: Revisiting Pinker and Prince (1988) and the Past Tense Debate, Transactions of the Association for Computational Linguistics, 6:651-665.
[38]. Kenneth J. KURTZ (2007), The Divergent Autoencoder (DIVA) Model of Category Learning, Psychonomic Bulletin & Review, 14(4):560-576.
[39]. Regine LAI (2015), Learnable vs. Unlearnable Harmony Patterns, Linguistic Inquiry, 46(3):425-451.
[40]. Andrew LAMONT (2018), Precedence is Pathological: The Problem of Alphabetical Sorting, Proceedings of the 36th West Coast Conference on Formal Linguistics, pp. 243-249.
[41]. Andrew LAMONT (2019a), Majority Rule in Harmonic Serialism, in Proceedings of the Annual Meetings on Phonology, volume 7.
[42]. Andrew LAMONT (2019b), Sour Grapes is Phonotactically Complex, Linguistic Society of America, 2019 Annual Meeting.
[43]. Feifei LI, Shan JIANG, Xiuyan GUO, Zhiliang YANG, and Zoltan DIENES (2013), The Nature of the Memory Buffer in Implicit Learning: Learning Chinese Tonal Symmetries, Consciousness and cognition, 22(3):920-930.
[44]. Linda LOMBARDI (1999), Positional Faithfulness and Voicing Assimilation in Optimality Theory, Natural Language & Linguistic Theory, 17(2):267-302.
[45]. R. Duncan LUCE (1959), Individual Choice Behavior, Dover Publications. Gary MARCUS, Sugumaran VIJAYAN, S. Bandi RAO, and Peter M. VISHTON (1999), Rule Learning by Seven-month-old Infants, Science, 283(5398):77-80.
[46]. R. Thomas MCCOY, Robert FRANK, and Tal LINZEN (2020), Does Syntax Need to Grow on Trees? Sources of Hierarchical Inductive Bias in Sequence-to-sequence Networks, Transactions of the Association for Computational Linguistics, 8:125-140.
[47]. Kevin MCMULLIN and Gunnar Ólafur HANSSON (2019), Inductive Learning of Locality Relations in Segmental Phonology, Laboratory Phonology: Journal of the Association for Laboratory Phonology, 10(1).
[48]. Kevin James MCMULLIN (2016), Tier-based Locality in Long-distance Phonotactics: Learnability and Typology, Ph.D. thesis, University of British Columbia.
[49]. Elliott MORETON and Joe PATER (2012), Structure and Substance in Artificial-phonology Learning, Part I: Structure, Language and Linguistics Compass, 6(11):686-701.
[50]. Elliott MORETON, Joe PATER, and Katya PERTSOVA (2017), Phonological Concept Learning, Cognitive science, 41(1):4-69.
[51]. Max NELSON, Hossep DOLATIAN, Jonathan RAWSKI, and Brandon PRICKETT (2020), Probing RNN Encoder-decoder Generalization of Subregular Functions using Reduplication, Proceedings of the Society for Computation in Linguistics, 3(1):31-42.
[52]. Elissa L. NEWPORT and Richard N. ASLIN (2004), Learning at a Distance I. Statistical Learning of Non-adjacent Dependencies, Cognitive psychology, 48(2):127-162.
[53]. Charlie O’HARA and Caitlin SMITH (2019), Computational Complexity and Sour-Grapes-like Patterns, in Proceedings of the Annual Meetings on Phonology, volume 7.
[54]. Brandon PRICKETT (2019), Learning Biases in Opaque Interactions, Phonology, 36(4):627-653.
[55]. Brandon PRICKETT, Aaron TRAYLOR, and Joe PATER (2018), Seq2Seq Models with Dropout can Learn Generalizable Reduplication, in Proceedings of the Fifteenth Workshop on Computational Research in Phonetics, Phonology, and Morphology, pp. 93-100.
[56]. Alan PRINCE and Bruce TESAR (2004), Learning Phonotactic Distributions, Constraints in phonological acquisition, pp. 245-291.
[57]. Ezer RASIN and Roni KATZIR (2016), On Evaluation Metrics in Optimality Theory, Linguistic Inquiry, 47(2):235-282.
[58]. Shauli RAVFOGEL, Yoav GOLDBERG, and Tal LINZEN (2019), Studying the Inductive Biases of RNNs with Synthetic Variations of Natural Languages, in Proceedings of NAACL-HLT, pp. 3532-3542.
[59]. Sharon ROSE and Rachel WALKER (2011), Harmony Systems, The handbook of phonological theory, 2:240-290.
[60]. DE RUMELHART and JL MCCLELLAND (1986), On Learning the Past Tenses of English Verbs, in JL MCCLELLAND and DE RUMELHART, editors, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, volume 2: Psychological and Biological Models, pp. 216-271, The MIT Press.
[61]. Edward SAPIR and Harry HOIJER (1967), The Phonology and Morphology of the Navaho Language, University of California Press.
[62]. Hava T. SIEGELMANN (1999), Neural Networks and Analog Computation: Beyond the Turing Limit, Springer Science & Business Media.
[63]. Caitlin SMITH, Charlie O’HARA, Eric ROSEN, and Paul SMOLENSKY (2021), Emergent Gestural Scores in a Recurrent Neural Network Model of Vowel Harmony, Proceedings of the Society for Computation in Linguistics, 4(1):61-70.
[64]. Ilya SUTSKEVER, Oriol VINYALS, and Quoc V. LE (2014), Sequence to Sequence Learning with Neural Networks, in Advances in Neural Information Processing Systems, pp. 3104-3112.
[65]. David S. TOURETZKY (1989), Towards a Connectionist Phonology: The “Many Maps” Approach to Sequence Manipulation, in Proceedings of the 11th Annual Conference of the Cognitive Science Society, pp. 188-195.
[66]. David S. TOURETZKY and Deirdre W. WHEELER (1990), A Computational Basis for Phonology, in Advances in Neural Information Processing Systems, pp. 372-379.
[67]. Paul TUPPER and Bobak SHAHRIARI (2016), Which Learning Algorithms Can Generalize Identity-Based Rules to Novel Inputs?, arXiv preprint arXiv:1605.04002.
[68]. Gail WEISS, Yoav GOLDBERG, and Eran YAHAV (2018), On the Practical Computational Power of Finite Precision RNNs for Language Recognition, in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 740-745.
[69]. Gesche WESTPHAL-FITCH, Beatrice GIUSTOLISI, Carlo CECCHETTO, Jordan Scott MARTIN, and W. Tecumseh FITCH (2018), Artificial Grammar Learning Capabilities in a Visual Task Match Requirements for Linguistic Syntax, Frontiers in psychology, 9:1210.
[70]. Colin WILSON (2003), Analyzing Unbounded Spreading with Constraints: Marks, Targets, and Derivations, Unpublished manuscript, University of California, Los Angeles.

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-c77be6d2-a71c-4397-8936-b28590cfb0a1