Supposed maximum mutual information for improving generalization and interpretation of multi-layered neural networks

Kamimura, Ryotaro

doi:10.2478/jaiscr-2018-0029

Artykuł - szczegóły

Tytuł artykułu

Supposed maximum mutual information for improving generalization and interpretation of multi-layered neural networks

Autorzy

Kamimura Ryotaro

Treść / Zawartość

Pełne teksty:

kamimura_ Supposed Maximum Mutual Information.pdf

Pobierz

Identyfikatory

DOI

10.2478/jaiscr-2018-0029

Warianty tytułu

Języki publikacji

Abstrakty

The present paper1 aims to propose a new type of information-theoretic method to maximize mutual information between inputs and outputs. The importance of mutual information in neural networks is well known, but the actual implementation of mutual information maximization has been quite difficult to undertake. In addition, mutual information has not extensively been used in neural networks, meaning that its applicability is very limited. To overcome the shortcoming of mutual information maximization, we present it here in a very simplified manner by supposing that mutual information is already maximized before learning, or at least at the beginning of learning. The method was applied to three data sets (crab data set, wholesale data set, and human resources data set) and examined in terms of generalization performance and connection weights. The results showed that by disentangling connection weights, maximizing mutual information made it possible to explicitly interpret the relations between inputs and outputs.

Słowa kluczowe

mutual information disentanglement generalization interpretation

Wydawca

University of Social Sciences

Czasopismo

Journal of Artificial Intelligence and Soft Computing Research

Rocznik

2019

Tom

Vol. 9, No. 2

Strony

123--147

Opis fizyczny

Bibliogr. 35 poz., rys.

Twórcy

autor

Kamimura Ryotaro

ryo@keyaki.cc.u-tokai.ac.jp

IT Education Center, Tokai University 4-1-1 Kitakaname, Hiratsuka, Kanagawa 259-1292, Japan

Bibliografia

[1] R. Kamimura, Mutual information maximization for improving and interpreting multi-layered neural network, in Proceedings of the 2017 IEEE Symposiumn Series on Computational Intelligence (SSCI) (SSCI 2017), 2017.
[2] R. Linsker, Self-organization in a perceptual network, Computer, vol. 21, no. 3, pp. 105–117, 1988.
[3] R. Linsker, How to generate ordered maps by maximizing the mutual information between input and output signals, Neural computation, vol. 1, no. 3, pp. 402–411, 1989.
[4] R. Linsker, Local synaptic learning rules suffice to maximize mutual information in a linear network, eural Computation, vol. 4, no. 5, pp. 691–702, 1992.
[5] R. Linsker, Improved local learning rule for information maximization and related applications, Neural networks, vol. 18, no. 3, pp. 261–265, 2005.
[6] R. Battiti, Using mutual information for selecting features in supervised neural net learning, IEEE Transactions on neural networks, vol. 5, no. 4, pp. 537–550, 1994.
[7] S. Becker, Mutual information maximization: models of cortical self-organization, Network: Computation in Neural Systems, vol. 7, pp. 7–31, 1996.
[8] G. Deco, W. Finnoff, and H. Zimmermann, Unsupervised mutual information criterion for elimination of overtraining in supervised multilayer networks, Neural Computation, vol. 7, no. 1, pp. 86–107, 1995.
[9] G. Deco and D. Obradovic, An informationtheoretic approach to neural computing. Springer Science & Business Media, 2012.
[10] J. C. Principe, D. Xu, and J. Fisher, Information theoretic learning, Unsupervised adaptive filtering, vol. 1, pp. 265–319, 2000.
[11] J. C. Principe, Information theoretic learning: Renyi’s entropy and kernel perspectives, Springer Science & Business Media, 2010.
[12] P. A. Estevez, M. Tesmer, C. A. Perez, and J. M. Zurada, Normalized mutual information feature selection, IEEE Transactions on Neural Networks, vol. 20, no. 2, pp. 189–201, 2009.
[13] P. Comon, Independent component analysis, nHigher-Order Statistics, pp. 29–38, 1992.
[14] A. J. Bell and T. J. Sejnowski, The independent components of natural scenes are edge filters, Vision research, vol. 37, no. 23, pp. 3327–3338, 1997.
[15] A. Hyväinen and E. Oja, Independent component analysis: algorithms and applications, Neural networks, vol. 13, no. 4, pp. 411–430, 2000.
[16] P. Comon, Independent component analysis: a new concept, Signal Processing, vol. 36, pp. 287–314, 1994.
[17] A. Bell and T. J. Sejnowski, An informationmaximization approach to blind separation and blind deconvolution, Neural Computation, vol. 7, no. 6, pp. 1129–1159, 1995.
[18] J. Karhunen, A. Hyvarinen, R. Vigario, J. Hurri, and E. Oja, Applications of neural blind separation to signal and image processing, in Acoustics, Speech, and Signal Processing, 1997. ICASSP-97., 1997 IEEE International Conference on, vol. 1, pp. 131–134, IEEE, 1997.
[19] H. B. Barlow, Unsupervised learning, Neural computation, vol. 1, no. 3, pp. 295–311, 1989.
[20] H. B. Barlow, T. P. Kaushal, and G. J. Mitchison, Finding minimum entropy codes, Neural Computation, vol. 1, no. 3, pp. 412–423, 1989.
[21] R. Kamimura, Simple and stable internal representation by potential mutual information maximization, in International Conference on Engineering Applications of Neural Networks, pp. 309–316, Springer, 2016.
[22] R. Kamimura, Self-organizing selective potentiality learning to detect important input neurons, in Systems, Man, and Cybernetics (SMC), 2015 IEEE International Conference on, pp. 1619–1626, IEEE, 2015.
[23] R. Kamimura, Collective interpretation and potential joint information maximization, in Intelligent information Processing VIII: 9th IFIP TC 12 International Conference, IIP 2016, Melbourne, VIC,Australia, November 18-21, 2016, Proceedings 9, pp. 12–21, 2016. Springer.
[24] R. Kamimura, Repeated potentiality assimilation: simplifying learning procedures by positive, independent and indirect operation for improving generalization and interpretation, in Neural Networks (IJCNN), 2016 International Joint Conference on, pp. 803–810, IEEE, 2016.
[25] R. Kamimura, Collective mutual information maximization to unify passive and positive approaches for improving interpretation and generalization, Neural Networks, vol. 90, pp. 56–71, 2017.
[26] R. Kamimura, Direct potentiality assimilation for improving multi-layered neural networks, in Proceedings of the 2017 Federated Conference on Computer Science and Information Systems, pp. 19–23, 2017.
[27] R. Andrews, J. Diederich, and A. B. Tickle, Survey and critique of techniques for extracting rules from trained artificial neural networks, Knowledge-based systems, vol. 8, no. 6, pp. 373–389, 1995.
[28] J. M. Benitez, J. L. Castro, and I. Requena, Are artificial neural networks black boxes?, IEEE Transactions on Neural Networks, vol. 8, no. 5, pp. 1156–1164, 1997.
[29] M. Ishikawa, Rule extraction by successive regularization, Neural Networks, vol. 13, no. 10, pp. 1171–1183, 2000.
[30] T. Q. Huynh and J. A. Reggia, Guiding hidden layer representations for improved rule extraction from neural networks, IEEE Transactions on Neural Networks, vol. 22, no. 2, pp. 264–275, 2011.
[ 31] B. Mak and T. Munakata, Rule extraction from expert heuristics: a comparative study of rough sets ith neural network and ID3, European journal of Operational Research, vol. 136, pp. 212–229, 2002.
[32] J. Yosinski, J. Clune, T. Fuchs, and H. Lipson, Understanding neural networks through deep visualization, in In ICML Workshop on Deep Learning, Citeseer, 2015.
[33] D. Erhan, Y. Bengio, A. Courville, and P. Vincent, Visualizing higher-layer features of a deep network, University of Montreal, vol. 1341, 2009.
[34] J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks, vol. 61, pp. 85–117, 2015.
[35] M. G. Cardoso, Logical discriminant models, in Quantitative Modelling In Marketing And Management, pp. 223–253, World Scientific, 2013.

Uwagi

Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).

Typ dokumentu

Bibliografia

Identyfikator YADDA

bwmeta1.element.baztech-48488c9b-f4ea-4583-a763-fb109d34933e