Character-based recurrent neural networks for morphological relational reasoning

Mogren, Olof; Johansson, Richard

doi:10.15398/jlm.v7i1.218

Artykuł - szczegóły

Tytuł artykułu

Character-based recurrent neural networks for morphological relational reasoning

Autorzy

Mogren Olof , Johansson Richard

Treść / Zawartość

Pełne teksty:

Mogren_Character-based recurrent neural networks_1_2019.pdf

Pobierz

Identyfikatory

DOI

10.15398/jlm.v7i1.218

Warianty tytułu

Języki publikacji

Abstrakty

We present a model for predicting inflected word forms based on morphological analogies. Previous work includes rule-based algorithms that determine and copy affixes from one word to another, with limited support for varying inflectional patterns. In related tasks such as morphological reinflection, the algorithm is provided with an explicit enumeration of morphological features which may not be available in all cases. In contrast, our model is feature-free: instead of explicitly representing morphological features, the model is given a demo pair that implicitly specifies a morphological relation (such as write: writes specifying infinitive:present). Given this demo relation and a query word (e.g. watch), the model predicts the target word (e.g. watches). To address this task, we devise a character-based recurrent neural network architecture using three separate encoders and one decoder. Our experimental evaluation on five different languages shows that the exact form can be predicted with high accuracy, consistently beating the baseline methods. Particularly, for English the prediction accuracy is 94.85%. The solution is not limited to copying affixes from the demo relation, but generalizes to words with varying inflectional patterns, and can abstract away from the orthographic level to the level of morphological forms.

Słowa kluczowe

morphological analogies morphological inflection morphological reinflection recurrent neural network character-based modelling

Wydawca

Instytut Podstaw Informatyki PAN

Czasopismo

Journal of Language Modelling

Rocznik

2019

Tom

Vol. 7, No. 1

Strony

139--170

Opis fizyczny

Bibliogr. 45 poz., rys., tab., wykr.

Twórcy

autor

Mogren Olof

olof@mogren.one

RISE Research Institutes of Sweden

autor

Johansson Richard

richard.johansson@gu.se

University of Gothenburg, Sweden

Bibliografia

[1] Malin Ahlberg, Markus Forsberg, and Mans Hulden (2015), Paradigm classification in supervised learning of morphology, in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1024-1029, Association for Computational Linguistics, Denver, United States, doi: 10.3115/v1/N15-1107.
[2] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio (2015), Neural machine translation by jointly learning to align and translate, in Proceedings of the 3rd International Conference on Learning Representations, ICLR, Conference Track Proceedings, San Diego, United States.
[3] Yoshua Bengio, Patrice Simard, and Paolo Frasconi (1994), Learning long-term dependencies with gradient descent is difficult, IEEE Transactions on Neural Networks, 5 (2): 157-166, doi: 10.1109/72.279181.
[4] Joachim Bingel and Anders Søgaard (2017), Identifying beneficial task relations for multi-task learning in deep neural networks, in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, pp. 164-169, Association for Computational Linguistics, Valencia, Spain.
[5] Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov (2017), Enriching word vectors with subword information, Transactions of the Association for Computational Linguistics, 5: 135-146, doi: 10.1162/tacl_a_00051.
[6] Lars Borin, Markus Forsberg, and Lennart Lönngren (2013), SALDO: a touch of yin to WordNet’s yang, Language Resources and Evaluation, 47 (4): 1191-1211, doi: 10.1007/s10579-013-9233-4.
[7] Rich Caruana (1998), Multitask learning, in Learning to Learn, pp. 95-133, Springer US, Boston, MA, doi: 10.1007/978-1-4615-5529-2_5.
[8] Abhisek Chakrabarty, Onkar Arun Pandit, and Utpal Garain (2017), Context sensitive lemmatization using two successive bidirectional gated recurrent networks, in Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1481-1491, Association for Computational Linguistics, Vancouver, Canada, doi: 10.18653/v1/P17-1136.
[9] Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, and Yoshua Bengio (2014a), On the properties of neural machine translation: Encoder-decoder approaches, in Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 103-111, Association for Computational Linguistics, Doha, Qatar, doi: 10.3115/v1/W14-4012.
[10] Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio (2014b), Learning phrase representations using RNN encoder-decoder for statistical machine translation, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1724-1734, Association for Computational Linguistics, Doha, Qatar, doi: 10.3115/v1/D14-1179.
[11] Grzegorz Chrupała, Georgiana Dinu, and Josef Van Genabith (2008), Learning morphology with Morfette, in Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), pp. 2362-2367, European Language Resources Association (ELRA), Marrakech, Morocco.
[12] Ronan Collobert and Jason Weston (2008), A unified architecture for natural language processing: Deep neural networks with multitask learning, in Proceedings of the 25th International Conference on Machine Learning, ICML’08, pp. 160-167, ACM, Helsinki, Finland, doi: 10.1145/1390156.1390177.
[13] Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Patrick Xia, Manaal Faruqui, Sandra Kübler, David Yarowsky, Jason Eisner, and Mans Hulden (2017), CoNLL-SIGMORPHON 2017 shared task: universal morphological reinflection in 52 languages, in Proceedings of the CoNLL SIGMORPHON 2017 Shared Task: Universal Morphological Reinflection, pp. 1-30, Association for Computational Linguistics, Vancouver, Canada.
[14] Ryan Cotterell, Christo Kirov, John Sylak-Glassman, David Yarowsky, Jason Eisner, and Mans Hulden (2016a), The SIGMORPHON 2016 shared task – morphological reinflection, in Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pp. 10-22, Association for Computational Linguistics, Berlin, Germany, doi: 10.18653/v1/W16-2002.
[15] Ryan Cotterell, Hinrich Schütze, and Jason Eisner (2016b), Morphological smoothing and extrapolation of word embeddings, in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1651-1660, Association for Computational Linguistics, Berlin, Germany.
[16] Markus Dreyer and Jason Eisner (2011), Discovering morphological paradigms from plain text using a Dirichlet process mixture model, in Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 616-627, Association for Computational Linguistics, Edinburgh, United Kingdom.
[17] Greg Durrett and John DeNero (2013), Supervised learning of complete morphological paradigms, in Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 1185-1195, Association for Computational Linguistics, Atlanta, United States.
[18] Manaal Faruqui, Yulia Tsvetkov, Graham Neubig, and Chris Dyer (2016), Morphological inflection generation using character sequence to sequence learning, in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 634-643, Association for Computational Linguistics, San Diego, United States, doi: 10.18653/v1/N16-1077.
[19] Dedre Gentner, Keith James Holyoak, and Boicho N. Kokinov (2001), The analogical mind: Perspectives from cognitive science, MIT press.
[20] Sepp Hochreiter (1998), The vanishing gradient problem during learning recurrent neural nets and problem solutions, Int. J. Uncertain. Fuzziness Knowl.-Based Syst., 6 (2): 107-116, doi: 10.1142/S0218488598000094.
[21] Bart Jongejan and Hercules Dalianis (2009), Automatic training of lemmatization rules that handle morphological changes in pre-, in- and suffixes alike, in Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 145-153, Association for Computational Linguistics, Suntec, Singapore.
[22] Jakub Kanis and Luděk Müller (2005), Automatic lemmatizer construction with focus on OOV words lemmatization, in Text, Speech and Dialogue, pp. 132-139, Springer Berlin Heidelberg, Berlin, Heidelberg.
[23] Katharina Kann and Hinrich Schütze (2016), MED: The LMU System for the SIGMORPHON 2016 shared task on morphological reinflection, in Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pp. 62-70, Association for Computational Linguistics, Berlin, Germany, doi: 10.18653/v1/W16-2010.
[24] Yoon Kim, Yacine Jernite, David Sontag, and Alexander M. Rush (2016), Character-aware neural language models, in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, pp. 2741-2749, AAAI Press, Phoenix, United States.
[25] Diederik Kingma and Jimmy Ba (2015), Adam: a method for stochastic optimization, in Proceedings of the 3rd International Conference on Learning Representations, ICLR, Conference Track Proceedings, San Diego, United States.
[26] Kimmo Koskenniemi (1984), A general computational model for word-form recognition and production, in 10th International Conference on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics, pp. 178-181, Association for Computational Linguistics, Stanford, United States, doi: 10.3115/980491.980529.
[27] Yves Lepage (1998), Solving analogies on words: an algorithm, in Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1, pp. 728-734, Association for Computational Linguistics, Montreal, Canada, doi: 10.3115/980845.980967.
[28] Minh-Thang Luong and Christopher D. Manning (2016), Achieving open vocabulary neural machine translation with hybrid word-character models, in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1054-1063, Association for Computational Linguistics, Berlin, Germany, doi: 10.18653/v1/P16-1100.
[29] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean (2013a), Efficient estimation of word representations in vector space, in Proceedings of the International Conference on Learning Representations (ICLR), Workshop Track, Scottsdale, United States.
[30] Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean (2013b), Distributed representations of words and phrases and their compositionality, in Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, pp. 3111-3119, Curran Associates Inc., Lake Tahoe, United States.
[31] Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig (2013c), Linguistic regularities in continuous space word representations, in Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746-751, Association for Computational Linguistics, Atlanta, United States.
[32] Andriy Mnih and Koray Kavukcuoglu (2013), Learning word embeddings efficiently with noise-contrastive estimation, in Advances in Neural Information Processing Systems 26, pp. 2265-2273, Curran Associates, Inc.
[33] Garrett Nicolai, Colin Cherry, and Grzegorz Kondrak (2015a), Inflection generation as discriminative string transduction, in Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 922-931, Association for Computational Linguistics, Denver, Colorado, doi: 10.3115/v1/N15-1093.
[34] Garrett Nicolai, Colin Cherry, and Grzegorz Kondrak (2015b), Morpho-syntactic regularities in continuous word representations: A multilingual study, in Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, pp. 129-134, Association for Computational Linguistics, Denver, United States, doi: 10.3115/v1/W15-1518.
[35] Jeffrey Pennington, Richard Socher, and Christopher Manning (2014), GloVe: global vectors for word representation, in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532-1543, Association for Computational Linguistics, Doha, Qatar, doi: 10.3115/v1/D14-1162.
[36] Graeme D. Ritchie, Graham J. Russell, Alan W. Black, and Stephen G. Pulman (1991), Computational morphology: practical mechanisms for the English lexicon, ACL-MIT Series in Natural Language Processing, MIT Press, Cambridge, United States.
[37] Jürgen Schmidhuber and Sepp Hochreiter (1997), Long short-term memory, Neural Computation, 9 (8): 1735-1780, doi: 10.1162/neco.1997.9.8.1735.
[38] Rico Sennrich, Barry Haddow, and Alexandra Birch (2016), Neural machine translation of rare words with subword units, in Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1715-1725, Association for Computational Linguistics, Berlin, Germany, doi: 10.18653/v1/P16-1162.
[39] Nicolas Stroppa and François Yvon (2005), An analogical learner for morphological analysis, in Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), pp. 120-127, Association for Computational Linguistics, Ann Arbor, United States.
[40] Ilya Sutskever, Oriol Vinyals, and Quoc V. Le (2014), Sequence to sequence learning with neural networks, in Advances in Neural Information Processing Systems 27, pp. 3104-3112, Curran Associates, Inc.
[41] Robert A. Wagner and Michael J. Fischer (1974), The string-to-string correction problem, J. ACM, 21 (1): 168-173, doi: 10.1145/321796.321811.
[42] Changfeng Wang, Santosh S. Venkatesh, and J. Stephen Judd (1994), Optimal stopping and effective machine complexity in learning, in Advances in Neural Information Processing Systems 6, pp. 303-310, Morgan-Kaufmann.
[43] David Yarowsky and Richard Wicentowski (2000), Minimally supervised morphological analysis by multimodal alignment, in Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics, pp. 207-216, Association for Computational Linguistics, Hong Kong, doi: 10.3115/1075218.1075245.
[44] François Yvon (1997), Paradigmatic cascades: a linguistically sound model of pronunciation by analogy, in Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics, pp. 428-435, Association for Computational Linguistics, Madrid, Spain, doi: 10.3115/976909.979672.
[45] Xiang Zhang, Junbo Zhao, and Yann LeCun (2015), Character-level convolutional networks for text classification, in C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural Information Processing Systems 28, pp. 649-657, Curran Associates, Inc.

Uwagi

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2020).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-522a55e6-67f3-43ed-bd78-7c70afa16562