On Loss Functions for Deep Neural Networks in Classification

Janocha, K.; Czarnecki, W. M.

doi:10.4467/20838476SI.16.004.6185

Artykuł - szczegóły

Tytuł artykułu

On Loss Functions for Deep Neural Networks in Classification

Autorzy

Janocha K. , Czarnecki W. M.

Wybrane pełne teksty z tego czasopisma

http://www.ejournals.eu/Schedae-Informaticae/

Identyfikatory

DOI

10.4467/20838476SI.16.004.6185

Warianty tytułu

Języki publikacji

Abstrakty

Deep neural networks are currently among the most commonly used classifiers. Despite easily achieving very good performance, one of the best selling points of these models is their modular design – one can conveniently adapt their architecture to specific needs, change connectivity patterns, attach specialised layers, experiment with a large amount of activation functions, normalisation schemes and many others. While one can find impressively wide spread of various configurations of almost every aspect of the deep nets, one element is, in authors’ opinion, underrepresented – while solving classification problems, vast majority of papers and applications simply use log loss. In this paper we try to investigate how particular choices of loss functions affect deep models and their learning dynamics, as well as resulting classifiers robustness to various effects. We perform experiments on classical datasets, as well as provide some additional, theoretical insights into the problem. In particular we show that L1 and L2 losses are, quite surprisingly, justified classification objectives for deep nets, by providing probabilistic interpretation in terms of expected misclassification. We also introduce two losses which are not typically used as deep nets objectives and show that they are viable alternatives to the existing ones.

Słowa kluczowe

loss function deep learning classification theory

Wydawca

Wydawnictwo Uniwersytetu Jagiellońskiego

Czasopismo

Schedae Informaticae

Rocznik

2016

Tom

Vol. 25

Strony

49--59

Opis fizyczny

Bibliogr. 13 poz., rys.

Twórcy

autor

Janocha K.

kasiajanocha@gmail.com

Faculty of Mathematics and Computer Science, Jagiellonian University, Kraków, Poland

autor

Czarnecki W. M.

lejlot@google.com

Faculty of Mathematics and Computer Science, Jagiellonian University, Kraków, Poland
DeepMind, London, UK

Bibliografia

[1] Larochelle H., Bengio Y., Louradour J., Lamblin P., Exploring strategies for training deep neural networks. Journal of Machine Learning Research, 2009, 10 (Jan), pp. 1–40.
[2] Krizhevsky A., Sutskever I., Hinton G.E., Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems,2012, pp. 1097–1105.
[3] Oord A.v.d., Dieleman S., Zen H., Simonyan K., Vinyals O., Graves A., Kalchbrenner N., Senior A., Kavukcuoglu K., Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
[4] Silver D., Huang A., Maddison C.J., Guez A., Sifre L., Van Den Driessche G., Schrittwieser J., Antonoglou I., Panneershelvam V., Lanctot M., et al., Mastering the game of go with deep neural networks and tree search. Nature, 2016, 529 (7587), pp. 484–489.
[5] Clevert D.A., Unterthiner T., Hochreiter S., Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289,2015.
[6] Kingma D., Ba J., Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[7] Tang Y., Deep learning using linear support vector machines. arXiv preprint arXiv:1306.0239, 2013.
[8] Lee C.Y., Xie S., Gallagher P., Zhang Z., Tu Z., Deeply-supervised nets. In: AISTATS. vol. 2., 2015, pp. 6.
[9] Choromanska A., Henaff M., Mathieu M., Arous G.B., LeCun Y., The loss surfaces of multilayer networks. In: AISTATS, 2015.
[10] Czarnecki W.M., Jozefowicz R., Tabor J., Maximum entropy linear manifold for learning discriminative low-dimensional representation. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, Springer, 2015, pp. 52–67.
[11] LeCun Y., Cortes C., Burges C.J., The mnist database of handwritten digits, 1998.
[12] Srivastava N., Hinton G.E., Krizhevsky A., Sutskever I., Salakhutdinov R., Dropout: a simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 2014, 15 (1), pp. 1929–1958.
[13] Principe J.C., Xu D., Fisher J., Information theoretic learning. Unsupervised adaptive filtering, 2000, 1, pp. 265–319.

Uwagi

Opracowanie ze środków MNiSW w ramach umowy 812/P-DUN/2016 na działalność upowszechniającą naukę (zadania 2017).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-30227f10-ea1b-4095-9e5c-46d8c36d2ea6