PL EN


Preferencje help
Widoczny [Schowaj] Abstrakt
Liczba wyników
Tytuł artykułu

Information geometry of divergence functions

Autorzy
Treść / Zawartość
Identyfikatory
Warianty tytułu
Języki publikacji
EN
Abstrakty
EN
Measures of divergence between two points play a key role in many engineering problems. One such measure is a distance function, but there are many important measures which do not satisfy the properties of the distance. The Bregman divergence, Kullback-Leibler divergence and f-divergence are such measures. In the present article, we study the differential-geometrical structure of a manifold induced by a divergence function. It consists of a Riemannian metric, and a pair of dually coupled affine connections, which are studied in information geometry. The class of Bregman divergences are characterized by a dually flat structure, which is originated from the Legendre duality. A dually flat space admits a generalized Pythagorean theorem. The class of f-divergences, defined on a manifold of probability distributions, is characterized by information monotonicity, and the Kullback-Leibler divergence belongs to the intersection of both classes. The f-divergence always gives the -geometry, which consists of the Fisher information metric and a dual pair of š-connections. The -divergence is a special class of f-divergences. This is unique, sitting at the intersection of the f-divergence and Bregman divergence classes in a manifold of positive measures. The geometry derived from the Tsallis q-entropy and related divergences are also addressed.
Słowa kluczowe
Twórcy
autor
autor
Bibliografia
  • [1] S. Amari and H. Nagaoka, Methods of Information Geometry, Oxford University Press, New York, 2000.
  • [2] A. Cichocki, R. Zdunek, A.H. Phan, and S. Amari, Nonnegative Matrix and Tensor Factorizations, John Wiley, New York, 2009.
  • [3] F. Nielsen, “Emerging trends in visual computing”, Lecture Notes in Computer Science 6, CD-ROM (2009).
  • [4] L. Bregman, “The relaxation method of finding a common point of convex sets and its application to the solution of problems in convex programming”, Comp. Math. Phys., USSR 7, 200–217 (1967).
  • [5] A. Banerjee, S. Merugu, I.S. Dhillon, and J. Ghosh, “Clustering with Bregman divergences”, J. Machine Learning Research 6, 1705–1749 (2005).
  • [6] M.S. Ali and S.D. Silvey, “A general class of coefficients of divergence of one distribution from another”, J. Royal Statistical Society, B(28), 131–142 (1966).
  • [7] I. Csisz´ar, “Information-type measures of difference of probability distributions and indirect observations”, Studia Sci. Math. 2, 299–318 (1967).
  • [8] I. Csisz´ar, “Information measures: a critical survey”, Transaction of the 7th Prague Conf. 1, 83–86 (1974).
  • [9] I. Taneja and P. Kumar, “Relative information of type s, Csiszar’s f-divergence, and information inequalities”, Information Sciences 166, 105–125 (2004).
  • [10] I. Csisz´ar, “Why least squares and maximum entropy? An axiomatic approach to inference for linear problems”, Annuals of Statistics 19, 2032–2066 (1991).
  • [11] N.N. Chentsov, Statistical Decision Rules and Optimal Inference, American Mathematical Society, New York, 1972.
  • [12] I. Csisz´ar, “Axiomatic characterizations of information measures”, Entropy 10, 261–273 (2008).
  • [13] G. Pistone and C. Sempi, “An infinite-dimensional geometric structure on the space of all the probability measures equivalent to a given on”, Annals of Statistics 23, 1543–1561 (1995).
  • [14] S. Amari, “Alpha divergence is unique, belonging to both classes of f-divergence and Bregman divergence”, IEEE Trans. Information Theory B 55, 4925–4931 (2009).
  • [15] C. Tsallis, “Possible generalization of Boltzmann-Gibbs statistics”, J. Stat. Phys. 52, 479–487 (1988).
  • [16] A. R´enyi, “On measures of entropy and information”, Proc. 4th Berk. Symp. Math. Statist. and Probl. 1, 547–561 (1961).
  • [17] J. Naudts, “Estimators, escort probabilities, and phiexponential families in statistical physics”, J. Ineq. Pure App. Math. 5, 102 (2004).
  • [18] J. Naudts, “Generalized exponential families and associated entropy functions”, Entropy 10, 131–149 (2008).
  • [19] H. Suyari, “Mathematical structures derived from qmultinomial coefficient in Tsallis statistics”, Physica A 368, 63–82 (2006).
  • [20] S. Amari, “Information geometry and its applications: convex function and dually flat manifold”, Emerging Trends in Visual Computing A2, 5416 (2009).
  • [21] M.R. Grasselli, “Duality, monotonicity and Wigner-Yanase-Dyson metrics”, Infinite Dimensional Analysis, Quantum Probability and Related Topics 7, 215–232 (2004).
  • [22] H. Hasegawa, “-divergence of the non-commutative information geometry”, Reports on Mathematical Physics 33, 87–93 (1993).
  • [23] D. Petz, “Monotone metrics on matrix spaces”, Linear Algebra and its Applications 244, 81–96 (1996).
  • [24] I.S. Dhillon and J.A. Tropp, “Matrix nearness problem with Bregman divergences”, SIAM J. on Matrix Analysis and Applications 29, 1120–1146 (2007).
  • [25] Yu. Nesterov and M.J. Todd, “On the Riemannian geometry defined by self-concordant barriers and interior-point methods”, Foundations of Computational Mathematics 2, 333–361 (2002).
  • [26] A. Ohara and T. Tsuchiya, An Information Geometric Approach to Polynomial-time Interior-time Algorithms, (to be published).
  • [27] A. Ohara, “Information geometric analysis of an interior point method for semidefinite programming”, Geometry in Present Day Science 1, 49–74 (1999).
  • [28] N. Murata, T. Takenouchi, T. Kanamori, and S. Eguchi, “Information geometry of U-boost and Bregman divergence”, Neural Computation 26, 1651–1686 (2004).
  • [29] S. Eguchi and J. Copas, “A class of logistic type discriminant function”, Biometrika 89, 1–22 (2002).
  • [30] M. Minami and S. Eguchi,“ Robust blind source separation by beta-divergence”, Neural Commutation 14, 1859–1886 (2004).
  • [31] H. Fujisawa and S. Eguchi, “Robust parameter estimation with a small bias against heavy contamination”, J. Multivariate Analysis 99, 2053–2081 (2008).
  • [32] J. Havrda and F. Charv´at, “Quantification method of classification process. Concept of structural -entropy”, Kybernetika 3, 30–35 (1967).
  • [33] H. Chernoff, “A measure of asymptotic efficiency for tests of a hypothesis based on a sum of observations”, Annals of Mathematical Statistics 23, 493–507 (1952).
  • [34] S. Amari, “Integration of stochastic models by minimizing l-divergence”, Neural Computation 19, 2780–2796 (2007).
  • [35] Y. Matsuyama, “The -EM algorithm: Surrogate likelihood maximization using -logarithmic information measures”, IEEE Trans. on Information Theory 49, 672–706 (2002).
  • [36] J. Zhang, “Divergence function, duality, and convex analysis”, Neural Computation 16, 159–195 (2004).
  • [37] S. Eguchi, “Second order efficiency of minimum contrast estimations in a curved exponential family”, Annals of Statistics 11, 793–803 (1983).
Typ dokumentu
Bibliografia
Identyfikator YADDA
bwmeta1.element.baztech-article-BPG8-0020-0019
JavaScript jest wyłączony w Twojej przeglądarce internetowej. Włącz go, a następnie odśwież stronę, aby móc w pełni z niej korzystać.