How to keep the HG weights non-negative : the truncated Perceptron reweighing rule

Magri, G.

doi:10.15398/jlm.v3i2.115

Artykuł - szczegóły

Tytuł artykułu

How to keep the HG weights non-negative : the truncated Perceptron reweighing rule

Autorzy

Magri G.

Treść / Zawartość

Pełne teksty:

Magri_How to keep the HG weights_2_2015.pdf

Pobierz

Identyfikatory

DOI

10.15398/jlm.v3i2.115

Warianty tytułu

Języki publikacji

Abstrakty

The literature on error-driven learning in Harmonic Grammar (HG) has adopted the Perceptron reweighing rule. Yet, this rule is not suited to HG, as it fails at ensuring non-negative weights. A variant is thus considered which truncates the updates at zero, keeping the weights non-negative. Convergence guarantees and error bounds for the original Perceptron are shown to extend to its truncated variant.

Słowa kluczowe

Harmonic Grammar error-driven learning Perceptron convergence

Wydawca

Instytut Podstaw Informatyki PAN

Czasopismo

Journal of Language Modelling

Rocznik

2015

Tom

Vol. 3, No. 2

Strony

345--375

Opis fizyczny

Bibliogr. 40 poz., rys., tab., wykr.

Twórcy

autor

Magri G.

SFL (CNRS and University of Paris 8), France
UiL-OTS (Utrecht University)

Bibliografia

[1] Tamás Sándor Bíró (2006), Finding the right words: Implementing Optimality Theory with Simulated Annealing, Ph.D. thesis, University of Groningen, available as ROA-896.
[2] Hans-Dieter Block (1962), The perceptron: A model of brain functioning, Review of Modern Physics, 34 (1): 123-135.
[3] Paul Boersma (1997), How we learn variation, optionality and probability, in Rob van Son, editor, Proceedings of the Institute of Phonetic Sciences (IFA) 21, pp. 43-58, Institute of Phonetic Sciences, University of Amsterdam.
[4] Paul Boersma (1998), Functional Phonology, Ph.D. thesis, University of Amsterdam, The Netherlands, holland Academic Graphics.
[5] Paul Boersma and Bruce Hayes (2001), Empirical tests for the Gradual Learning Algorithm, Linguistic Inquiry, 32 (1): 45-86.
[6] Paul Boersma and Joe Pater (to appear), Convergence properties of a gradual learner for Harmonic Grammar, in John McCarthy and Joe Pater, editors, Harmonic Grammar and Harmonic Serialism, Equinox Press.
[7] Paul Boersma and Jan-Willem van Leussen (2014), Fast evaluation and learning in multi-level parallel constraint grammars, University of Amsterdam.
[8] Nicolò Cesa-Bianchi and Gábor Lugosi (2006), Prediction, learning, and games, Cambridge University Press.
[9] Andries W. Coetzee and Shigeto Kawahara (2013), Frequency biases in phonological variation, Natural Language and Linguistic Theory, 31 (1): 47-89.
[10] Andries W. Coetzee and Joe Pater (2008), Weighted constraints and gradient restrictions on place co-occurrence in Muna and Arabic, Natural Language and Linguistic Theory, 26 (2): 289-337.
[11] Andries W. Coetzee and Joe Pater (2011), The place of variation in phonological theory, in John Goldsmith, Jason Riggle, and Alan Yu, editors, Handbook of phonological theory, pp. 401-434, Blackwell.
[12] Nello Cristianini and John Shawe-Taylor (2000), An introduction to Support Vector Machines and other kernel-based methods, Cambridge University Press.
[13] Robert Frank and Shyam Kapur (1996), On the use of triggers in parameter setting, Linguistic Inquiry, 27 (4): 623-660.
[14] Yoav Freund and Robert E. Schapire (1999), Large margin classification using the Perceptron algorithm, Machine Learning, 37 (3): 277-296.
[15] Edward Gibson and Kenneth Wexler (1994), Triggers, Linguistic Inquiry, 25 (3): 407-454.
[16] Bruce Hayes (2004), Phonological acquisition in Optimality Theory: The early stages, in René Kager, Joe Pater, and Wim Zonneveld, editors, Constraints in phonological acquisition, pp. 158-203, Cambridge University Press.
[17] Gaja Jarosz (2013), Learning with hidden structure in Optimality Theory and Harmonic Grammar: Beyond Robust Interpretative Parsing, Phonology, 30 (1): 27-71.
[18] Karen Jesney and Anne-Michelle Tessier (2011), Biases in Harmonic Grammar: the road to restrictive learning, Natural Language and Linguistic Theory, 29 (1): 251-290.
[19] Frank Keller (2000), Gradience in grammar. Experimental and computational aspects of degrees of grammaticality, Ph.D. thesis, University of Edinburgh, England.
[20] Jyrki Kivinen (2003), Online learning of linear classifiers, in Shahar Mendelson and Alexander J. Smola, editors, Advanced lectures on Machine Learning (LNAI 2600), pp. 235-257, Springer.
[21] Jyrki Kivinen, Manfred K. Warmuth, and Peter Auer (1997), The Perceptron algorithm versus Winnow: linear versus logarithmic mistake bounds when few input variables are relevant, Artificial Intelligence, 97 (1-2): 325-343.
[22] Norbert Klasner and Hans-Ulrich Simon (1995), From noise-free to noise-tolerant and from on-line to batch learning, in Wolfgang Maass, editor, Computational Learning Theory (COLT) 8, pp. 250-257, ACM.
[23] Gèraldine Legendre, Yoshiro Miyata, and Paul Smolensky (1998a), Harmonic Grammar: A formal multi-level connectionist theory of linguistic well-formedness: An application, in Morton Ann Gernsbacher and Sharon J. Derry, editors, Annual conference of the Cognitive Science Society 12, pp. 884-891, Lawrence Erlbaum Associates.
[24] Géraldine Legendre, Yoshiro Miyata, and Paul Smolensky (1998b), Harmonic Grammar: A formal multi-level connectionist theory of linguistic well-formedness: Theoretical foundations, in Morton Ann Gernsbacher and Sharon J. Derry, editors, Annual conference of the Cognitive Science Society 12, pp. 388-395, Lawrence Erlbaum.
[25] Gèraldine Legendre, Antonella Sorace, and Paul Smolensky (2006), The Optimality Theory/Harmonic Grammar connection, in Paul Smolensky and Gèraldine Legendre, editors, The Harmonic Mind, pp. 903-966, MIT Press.
[26] Nick Littlestone (1988), Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm, Machine Learning, 2 (4): 285-318.
[27] Giorgio Magri (2015), Idempotency in Optimality Theory, manuscript.
[28] Giorgio Magri (to appear), Error-driven learning in OT and HG: a comparison, Phonology.
[29] Marvin Minsky and Seymour Papert (1969), Perceptrons: An introduction to Computational Geometry, MIT Press.
[30] Mehryar Mohri and Afshin Rostamizadeh (2013), Perceptron Misteka bounds, arXiv:1305.0208.
[31] Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar (2012), Foundations of Machine Learning, MIT Press.
[32] Albert B. J. Novikoff (1962), On convergence proofs on Perceptrons, in Proceedings of the symposium on the mathematical theory of automata, volume XII, pp. 615-622.
[33] Joe Pater (2008), Gradual learning and convergence, Linguistic Inquiry, 39 (2): 334-345.
[34] Alan Prince (2002), Entailed Ranking Arguments, ms., Rutgers University, New Brunswick, NJ. Rutgers Optimality Archive, ROA 500. Available at http://www.roa.rutgers.edu.
[35] Alan Prince and Bruce Tesar (2004), Learning phonotactic distributions, in René Kager, Joe Pater, and Wim Zonneveld, editors, Constraints in phonological acquisition, pp. 245-291, Cambridge University Press.
[36] Frank Rosenblatt (1958), The Perceptron: A probabilistic model for information storage and organization in the brain, Psychological Review, 65 (6): 386-408.
[37] Frank Rosenblatt (1962), Principles of Neurodynamics, Spartan.
[38] Shai Shalev-Shwartz and Yoram Singer (2005), A new perspective on an old Perceptron algorithm, in Peter Auer and Ron Meir, editors, Conference on Computational Learning Theory (COLT) 18, Lecture notes in Computer Science, pp. 264-278, Springer.
[39] Paul Smolensky and Gèraldine Legendre (2006), The Harmonic Mind, MIT Press.
[40] Kenneth Wexler and Peter W. Culicover (1980), Formal principles of language acquisition, MIT Press, Cambridge, MA.

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-37bcd932-3358-477b-8f8e-23bbd26f9912