Towards textual data augmentation for neural networks: synonyms and maximum loss

Jungiewicz, Michał; Smywiński-Pohl, Aleksander

doi:10.7494/csci.2019.20.1.3023

Artykuł - szczegóły

Tytuł artykułu

Towards textual data augmentation for neural networks: synonyms and maximum loss

Autorzy

Jungiewicz Michał , Smywiński-Pohl Aleksander

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.7494/csci.2019.20.1.3023

Warianty tytułu

Języki publikacji

Abstrakty

Data augmentation is one of the ways to deal with labeled data scarcity and overfitting. Both of these problems are crucial for modern deep-learning algorithms, which require massive amounts of data. The problem is better explored in the context of image analysis than for text; this work is a step forward to help close this gap. We propose a method for augmenting textual data when training convolutional neural networks for sentence classification. The augmentation is based on the substitution of words using a thesaurus as well as Princeton University's WordNet. Our method improves upon the baseline in most of the cases. In terms of accuracy, the best of the variants is 1.2% (pp.) better than the baseline.

Słowa kluczowe

deep learning data augmentation neural networks natural language processing sentence classification

Wydawca

Wydawnictwa AGH

Czasopismo

Computer Science

Rocznik

2019

Tom

Vol. 20 (1)

Strony

57--83

Opis fizyczny

Bibliogr. 53 poz., rys., tab.

Twórcy

autor

Jungiewicz Michał

mjungiew@agh.edu.pl

AGH University of Science and Technology, Faculty of Computer Science, Electronics and Telecommunications, Department of Computer Science, Krakow, Poland

autor

Smywiński-Pohl Aleksander

apohllo@o2.pl

AGH University of Science and Technology, Faculty of Computer Science, Electronics and Telecommunications, Department of Computer Science, Krakow, Poland

Bibliografia

[1] Al-Rfou R., Choe D., Constant N., Guo M., Jones L.: Character-Level Language Modeling with Deeper Self-Attention, CoRR, vol. abs/1808.04444, 2018. http: //arxiv.org/abs/1808.04444.
[2] app.dimensions.ai website. https://app.dimensions.ai.
[3] Bojanowski P., Grave E., Joulin A., Mikolov T.: Enriching Word Vectors with Subword Information, Transactions of the Association for Computational Linguistics, vol. 5, pp. 135-146, 2017.
[4] Bottou L.: Largescale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT'2010, pp. 177-186. Springer, 2010.
[5] Ciresan D., Meier U., Masci J., Gambardella L.M., Schmidhuber J.: Flexible, high performance convolutional neural networks for image classification. In: Twenty- Second International Joint Conference on Artificial Intelligence, pp. 1237-1242, 2011.
[6] Ciresan D., Meier U., Schmidhuber J.: Multi-column deep neural networks for image classification. In: IEEE conference on Computer vision and pattern recognition (CVPR), pp. 3642-3649, 2012.
[7] Collobert R., Weston J., Bottou L., Karlen M., Kavukcuoglu K., Kuksa P.: Natural Language Processing (almost) from Scratch, Journal of Machine Learning Research, vol. 12, pp. 2493-2537, 2011.
[8] Coulombe C.: Text Data Augmentation Made Simple By Leveraging NLP Cloud APIs, CoRR, vol. abs/1812.04718, 2018. http://arxiv.org/abs/1812.04718
[9] Devlin J., Chang M.W., Lee K., Toutanova K.: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, CoRR, vol. abs/1810.04805, 2018. http://arxiv.org/abs/1810.04805.
[10] Dietterich T.G.: Approximate statistical tests for comparing supervised classification learning algorithms, Neural Computation, vol. 10(7), pp. 1895-1923, 1998.
[11] Dodge J., Gane A., Zhang X., Bordes A., Chopra S., Miller A., Szlam A., Weston J.: Evaluating Prerequisite Qualities for Learning End-to-End Dialog Systems, CoRR, 2015. https://arxiv.org/abs/1511.06931
[12] Faucett L.: Fundamentals of Neural Networks: Architectures, Algorithms, and Applications, Prentice-Hall, Inc., Upper Saddle River, NJ, USA, 1994.
[13] Fawzi A., Samulowitz H., Turaga D., Frossard P.: Adaptive data augmentation for image classification. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 3688-3692, 2016.
[14] Gehring J., Auli M., Grangier D., Yarats D., Dauphin Y.N.: Convolutional sequence to sequence learning. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1243-1252, 2017.
[15] Goodfellow I., Bengio Y., Courville A.: Deep Learning. MIT Press, 2016. http: //www.deeplearningbook.org.
[16] Harvard NLP, Kim CNN implementation. https://github.com/harvardnlp/ sent-conv-torch.
[17] Kim Y.: Convolutional Neural Networks for Sentence Classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746-1751. Association for Computational Linguistics, 2014. http://dx.doi.org/10.3115/v1/D14-1181.
[18] Kingma D.P., Ba J.: Adam: A Method for Stochastic Optimization, CoRR, vol. abs/1412.6980, 2015. https://arxiv.org/abs/1412.6980.
[19] Kobayashi S.: Contextual Augmentation: Data Augmentation by Words with Paradigmatic Relations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), pp. 452-457. Association for Computational Linguistics, 2018. http://dx.doi.org/10.18653/v1/N18-2072.
[20] Kobayashi S.: CNN implementation. https://github.com/pfnet-research/ contextual augmentation.
[21] Krizhevsky A., Sutskever I., Hinton G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in neural information processing systems 25 (NIPS 2012), pp. 1097-1105, 2012.
[22] LeCun Y.: Une procedure d'apprentissage pour reseau a seuil asymetrique. In: Proceedings of Cognitiva 85, pp. 599-604, 1985.
[23] LeCun Y., Bengio Y., Hinton G.: Deep learning, Nature, vol. 521(7553), p. 436, 2015.
[24] Li X., Roth D.: Learning question classifiers. In: Proceedings of the 19th international conference on Computational linguistics Volume 1, Association for Computational Linguistics, pp. 1-7, 2002.
[25] Lowe R., Pow N., Serban I., Pineau J.: The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems. In: Pro- ceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Association for Computational Linguistics, pp. 285-294, 2015. http://dx.doi.org/10.18653/v1/W15-4640.
[26] Manning C.D.: Computational linguistics and deep learning, Computational Linguistics, vol. 41(4), pp. 701-707, 2015.
[27] Mikolov T., Chen K., Corrado G., Dean J.: Efficient Estimation of Word Representations in Vector Space, CoRR, vol. abs/1301.3781, 2013. http://arxiv.or g/abs/1301.3781.
[28] Mikolov T., Sutskever I., Chen K., Corrado G.S., Dean J.: Distributed Representations of Words and Phrases and their Compositionality. In: Advances in neural information processing systems 26 (NIPS 2013), pp. 3111-3119, 2013.
[29] Miller G.A.: WordNet: An electronic lexical database. MIT Press, 1998.
[30] Miller G.A.: WordNet: a lexical database for English, Communications of the ACM, vol. 38(11), pp. 39-41, 1995.
[31] Parker D.B.: Learning Logic Technical Report TR-47,Center of Computational Research in Economics and Management Science, Massachusetts Institute of Technology, Cambridge, 1985.
[32] Paulin M., Revaud J., Harchaoui Z., Perronnin F., Schmid C.: Transformation pursuit for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3646-3653, 2014
[33] Pennington J., Socher R., Manning C.: GloVe: Global Vectors for Word Representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532-1543, 2014.
[34] Ptaszyński M., Leliwa G., Piech M., Smywiński-Pohl A.: Cyberbullying Detection -Technical Report 2/2018, Department of Computer Science AGH, University of Science and Technology, CoRR, vol. abs/1808.00926, 2018. http://arxiv.or g/abs/1808.00926.
[35] PyDictionary. http://pypi.org/project/PyDictionary/.
[36] Quijas J.K.: Analysing the effects of data augmentation and free parameters for text classification with recurrent convolutional neural networks, Master Thesis, The University of Texas at El Paso, 2017.
[37] Ratner A.J., Ehrenberg H., Hussain Z., Dunnmon J., Re C.: Learning to Compose Domain-Specific Transformations for Data Augmentation. In: Advances in Neural Information Processing Systems, pp. 3239-3249, 2017.
[38] Rosario R.R.: A Data Augmentation Approach to Short Text Classification, Ph.D. thesis, University of California, Los Angeles, 2017.
[39] Rumelhart D.E., Hinton G.E., Williams R.J.: Learning representations by back-propagating errors, Nature, vol. 323(6088), pp. 533-536, 1986.
[40] Simard P.Y., Steinkraus D., Platt J.C.: Best practices for convolutional neural networks applied to visual document analysis. In: Proceedings of Seventh International Conference on Document Analysis and Recognition, 2003, Edinburgh, UK, vol. 3, pp. 958-962, 2003.
[41] Socher R., Lin C.C., Ng A.Y., Manning C.: Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp. 129-136, 2011.
[42] Srivastava N., Hinton G.E., Krizhevsky A., Sutskever I., Salakhutdinov R.: Dropout: a simple way to prevent neural networks from overfitting, Journal of Machine Learning Research, vol. 15(1), pp. 1929-1958, 2014.
[43] Thesaurus.com. www.thesaurus.com.
[44] Toutanova K., Klein D., Manning C.D., Singer Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology { Volume 1, pp. 173-180, Association for Computational Linguistics, 2003.
[45] Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L., Polosukhin I.: Attention is all you need. In: Advances in Neural Information Processing Systems 30 (NIPS 2017), pp. 5998-6008, 2017.
[46] Werbos P.: Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences, PhD thesis, Harvard University, 1974.
[47] Wong S.C., Gatt A., Stamatescu V., McDonnell M.D.: Understanding data aug- mentation for classification: when to warp? In: Digital Image Computing: Techniques and Applications (DICTA), 2016 International Conference on, pp. 1-6, 2016.
[48] WordNet online. wordnet.princeton.edu.
[49] Wu Y., Schuster M., Chen Z., Le Q.V., Norouzi M., Macherey W., Krikun M., Cao Y., Gao Q., Macherey K., Klingner J., Shah A., Johnson M., Liu X., Kaiser L., Gouws S., Kato Y., Kudo T., Kazawa H., Stevens K., Kurian G., Patil N., Wang W., Young C., Smith J., Riesa J., Rudnick A., Vinyals O., Corrado G., Hughes M., Dean J.: Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, CoRR, vol. abs/1609.08144, 2016. http://arxiv.org/abs/1609.08144.
[50] Young T., Hazarika D., Poria S., Cambria E.: Recent trends in deep learning based natural language processing, IEEE Computational intelligence magazine, vol. 13(3), pp. 55-75, 2018.
[51] Zeiler M.D.: ADADELTA: an adaptive learning rate method. In: arXiv preprint arXiv:1212.5701, 2012.
[52] Zhang X., Zhao J., LeCun Y.: Character-level convolutional networks for text classification. In: Advances in neural information processing systems, pp. 649-657, 2015.
[53] Zhou X., Dong D., Wu H., Zhao S., Yu D., Tian H., Liu X., Yan R.: Multi-view response selection for human-computer conversation. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 372-381, 2016.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-56c51ccc-d18a-410e-aa00-f0e3340ae317