Gradient Regularization Improves Accuracy of Discriminative Models

Varga, Dániel; Csiszárik, Adrián; Zombori, Zsolt

doi:10.4467/20838476SI.18.003.10408

Artykuł - szczegóły

Tytuł artykułu

Gradient Regularization Improves Accuracy of Discriminative Models

Autorzy

Varga Dániel , Csiszárik Adrián , Zombori Zsolt

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.4467/20838476SI.18.003.10408

Warianty tytułu

Języki publikacji

Abstrakty

Regularizing the gradient norm of the output of a neural network is a powerful technique, rediscovered several times. This paper presents evidence that gradient regularization can consistently improve classification accuracy on vision tasks, using modern deep neural networks, especially when the amount of training data is small. We introduce our regularizers as members of a broader class of Jacobian-based regularizers. We demonstrate empirically on real and synthetic data that the learning process leads to gradients controlled beyond the training points, and results in solutions that generalize well.

Słowa kluczowe

neural network generalization gradient regularization spectral norm Frobenius norm

Wydawca

Wydawnictwo Uniwersytetu Jagiellońskiego

Czasopismo

Schedae Informaticae

Rocznik

2018

Tom

Vol. 27

Strony

31--45

Opis fizyczny

Bibliogr. 20 poz., rys.

Twórcy

autor

Varga Dániel

daniel@renyi.hu

Alfréd Rényi Institute of Mathematics, Hungarian Academy of Sciences

autor

Csiszárik Adrián

csadrian@renyi.hu

Alfréd Rényi Institute of Mathematics, Hungarian Academy of Sciences
ELTE, Institute of Mathematics, Department of Computer Science Budapest, Hungary

autor

Zombori Zsolt

zombori@renyi.hu

Alfréd Rényi Institute of Mathematics, Hungarian Academy of Sciences
ELTE, Institute of Mathematics, Department of Computer Science Budapest, Hungary

Bibliografia

1] Wojciech M. Czarnecki, Simon Osindero, Max Jaderberg, Grzegorz Swirszcz, and Razvan Pascanu. Sobolev training for neural networks. In NIPS, pages 4281–4290, 2017.
[2] H. Drucker and Y LeCun. Double backpropagation: Increasing generalization performance. In Proceedings of the International Joint Conference on Neural Networks, volume 2, pages 145–150, Seattle, WA, July 1991. IEEE Press.
[3] Shixiang Gu and Luca Rigazio. Towards deep neural network architectures robust to adversarial examples. CoRR, abs/1412.5068, 2014.
[4] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems 30 (NIPS 2017). Curran Associates, Inc., December 2017. arxiv: 1704.00028.
[5] László Györfi, Michael Kohler, Adam Krzyzak, and Harro Walk. A DistributionFree Theory of Nonparametric Regression. Springer series in statistics. Springer, 2002.
[6] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. CoRR, abs/1512.03385, 2015.
[7] Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Francis R. Bach and David M. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pages 448–456. JMLR.org, 2015.
[8] Daniel Jakubovitz and Raja Giryes. Improving DNN robustness to adversarial attacks using jacobian regularization. CoRR, abs/1803.08680, 2018.
[9] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
[10] Roman Novak, Yasaman Bahri, Daniel A. Abolafia, Jeffrey Pennington, and Jascha Sohl-Dickstein. Sensitivity and generalization in neural networks: an empirical study. In International Conference on Learning Representations, 2018.
[11] Alexander G. Ororbia II, Daniel Kifer, and C. Lee Giles. Unifying adversarial training algorithms with data gradient regularization. Neural Computation, 29(4):867–887, 2017.
[12] Gabriel Pereyra, George Tucker, Jan Chorowski, Lukasz Kaiser, and Geoffrey E. Hinton. Regularizing neural networks by penalizing confident output distributions. CoRR, abs/1701.06548, 2017.
[13] Lorenzo Rosasco, Silvia Villa, Sofia Mosci, Matteo Santoro, and Alessandro Verri. Nonparametric sparsity and regularization. Journal of Machine Learning Research, 14(1):1665–1714, 2013.
[14] Patrice Y. Simard, Bernard Victorri, Yann LeCun, and John S. Denker. Tangent prop - A formalism for specifying selected invariances in an adaptive network. In NIPS, pages 895–903. Morgan Kaufmann, 1991.
[15] A. Slavin Ross and F. Doshi-Velez. Improving the Adversarial Robustness and Interpretability of Deep Neural Networks by Regularizing their Input Gradients. ArXiv e-prints, November 2017.
[16] Jure Sokolic, Raja Giryes, Guillermo Sapiro, and Miguel R. D. Rodrigues. Robust large margin deep neural networks. IEEE Trans. Signal Processing, 65(16):4265–4280, 2017.
[17] Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15:1929–1958, 2014.
[18] Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian J. Goodfellow, and Rob Fergus. Intriguing properties of neural networks. CoRR, abs/1312.6199, 2013.
[19] G. Wahba. Spline Models for Observational Data. Society for Industrial and Applied Mathematics, Philadelphia, 1990.
[20] Yuichi Yoshida and Takeru Miyato. Spectral norm regularization for improving the generalizability of deep learning. CoRR, abs/1705.10941, 2017.

Uwagi

Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-626fbafa-6466-4be2-9cc4-d77a089cc5d8