Effects of Sparse Initialization in Deep Belief Networks

Grzegorczyk, K.; Kurdziel, M.; Wójcik, P. I.

doi:10.7494/csci.2015.16.4.313

Artykuł - szczegóły

Tytuł artykułu

Effects of Sparse Initialization in Deep Belief Networks

Autorzy

Grzegorczyk K. , Kurdziel M. , Wójcik P. I.

Treść / Zawartość

Pełne teksty:

Pobierz

Identyfikatory

DOI

10.7494/csci.2015.16.4.313

Warianty tytułu

Języki publikacji

Abstrakty

Deep neural networks are often trained in two phases: first, hidden layers are pretrained in an unsupervised manner, and then the network is fine-tuned with error backpropagation. Pretraining is often carried out using Deep Belief Networks (DBNs), with initial weights set to small random values. However, recent results established that well-designed initialization schemes, e.g., Sparse Initialization (SI), can greatly improve the performance of networks that do not use pretraining. An interesting question arising from these results is whether such initialization techniques wouldn’t also improve pretrained networks. To shed light on this question, in this work we evaluate SI in DBNs that are used to pretrain discriminative networks. The motivation behind this research is our observation that SI has an impact on the features learned by a DBN during pretraining. Our results demonstrate that this improves network performance: when pretraining starts from sparsely initialized weight matrices, networks achieve lower classification errors after fine-tuning.

Słowa kluczowe

sparse initialization Deep Belief Networks Noisy Rectified Linear Units

Wydawca

Wydawnictwa AGH

Czasopismo

Computer Science

Rocznik

2015

Tom

Vol. 16 (4)

Strony

313--327

Opis fizyczny

Bibliogr. 18 poz., rys., wykr., tab.

Twórcy

autor

Grzegorczyk K.

kgr@agh.edu.pl

AGH University of Science and Technology, Faculty of Computer Science, Electronics and Telecommunications, Department of Computer Science, Krakow, Poland

autor

Kurdziel M.

kurdziel@agh.edu.pl

AGH University of Science and Technology, Faculty of Computer Science, Electronics and Telecommunications, Department of Computer Science, Krakow, Poland

autor

Wójcik P. I.

pwojcik@agh.edu.pl

AGH University of Science and Technology, Faculty of Computer Science, Electronics and Telecommunications, Department of Computer Science, Krakow, Poland

Bibliografia

[1] Bengio Y.: Practical Recommendations for Gradient-Based Training of Deep Architectures. In: G. Montavon, G.B. Orr, K.R. M ̈uller, eds, Neural Networks: Tricks of the Trade , Lecture Notes in Computer Science , vol. 7700, pp. 437–478. Springer, Berlin–Heidelberg, 2012.
[2] Bergstra J., Bengio Y.: Random Search for Hyper-parameter Optimization. Journal of Machine Learning Research , vol. 13, pp. 281–305, 2012.
[3] Bridle J.S.: Probabilistic Interpretation of Feedforward Classification Network Outputs, with Relationships to Statistical Pattern Recognition. In: F. Souli ́e, J. H ́erault, eds, Neurocomputing, NATO ASI Series , vol. 68, pp. 227–236. Springer, Berlin–Heidelberg, 1990.
[4] Glorot X., Bengio Y.: Understanding the difficulty of training deep feedforward neural networks. In: Y.W. Teh, M. Titterington, eds, Proceedings of the 13th International Conference on Artificial Intelligence and Statistics (AISTATS) 2010 , vol. 9, pp. 249–256. JMLR Workshop and Conference Proceedings, 2010.
[5] Hinton G.E.: Training products of experts by minimizing contrastive divergence. Neural Computation , vol. 14(8), pp. 1771–1800, 2002.
[6] Hinton G.E.: A Practical Guide to Training Restricted Boltzmann Machines. In: G. Montavon, G.B. Orr, K.R. M ̈uller, eds, Neural Networks: Tricks of the Trade , Lecture Notes in Computer Science , vol. 7700, pp. 599–619. Springer, Berlin– Heidelberg, 2012.
[7] Hinton G.E., Salakhutdinov R.R.: Reducing the dimensionality of data with neural networks. Science , vol. 313(5786), pp. 504–507, 2006.
[8] LeCun Y., Bottou L., Bengio Y., Haffner P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE , vol. 86(11), pp. 2278–2324, 1998.
[9] LeCun Y., Huang F.J., Bottou L.: Learning methods for generic object recognition with invariance to pose and lighting. In: Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR04) , vol. 2, pp. II–97. IEEE, 2004.
[10] Martens J.: Deep learning via Hessian-free optimization. In: J. F ̈urnkranz, T. Joachims, eds, Proceedings of the 27th International Conference on Machine Learning (ICML-10) , pp. 735–742. Omnipress, 2010.
[11] Nair V., Hinton G.E.: Rectified Linear Units Improve Restricted Boltzmann Machines. In: J. F ̈urnkranz, T. Joachims, eds, Proceedings of the 27th International Conference on Machine Learning (ICML-10) , pp. 807–814. Omnipress, 2010.
[12] Nesterov Y.: A method of solving a convex programming problem with convergence rate O (1/k2). Soviet Mathematics Doklady , vol. 27(2), pp. 372–376, 1983.
[13] Polyak B.T.: Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics , vol. 4(5), pp. 1–17, 1964.
[14] Rumelhart D.E., Hinton G.E., Williams R.J.: Learning representations by back- propagating errors. Nature , vol. 323(6088), pp. 533–536, 1986.
[15] Smolensky P.: Information Processing in Dynamical Systems: Foundations of Harmony Theory. In: D.E. Rumelhart, J.L. McClelland, CORPORATE PDP Research Group, eds, Parallel Distributed Processing: Explorations in the Microstructure of Cognition , vol. 1, pp. 194–281. MIT Press, 1986.
[16] Srivastava N.: Improving neural networks with dropout . Master’s thesis, University of Toronto, 2013.
[17] Srivastava N., Hinton G.E., Krizhevsky A., Sutskever I., Salakhutdinov R.: Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research , vol. 15(1), pp. 1929–1958, 2014.
[18] Sutskever I., Martens J., Dahl G., Hinton G.E.: On the importance of initialization and momentum in deep learning. In: S. Dasgupta, D. Mcallester, eds, Proceedings of the 30th International Conference on Machine Learning (ICML- 13) , vol. 28, pp. 1139–1147. JMLR Workshop and Conference Proceedings, 2013.

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-f53943c7-1019-4602-bed3-12a28645c6ce