Divergence function, duality, and convex analysis

Authors:
Jun Zhang
Affiliations:
Department of Psychology, University of Michigan, Ann Arbor, MI
Venue:
Neural Computation
Year:
2004

Citing 7
Cited 9

Dualistic geometry of the manifold of higher-order neurons

Neural Networks
Information geometry of the EM and em algorithms for neural networks

Neural Networks
Measurements of generalisation based on information geometry

MANNA '95 Proceedings of the first international conference on Mathematics of neural networks : models, algorithms and applications: models, algorithms and applications
Convergence of the wake-sleep algorithm

Proceedings of the 1998 conference on Advances in neural information processing systems II
Robust blind source separation by beta divergence

Neural Computation
Iterating Bregman Retractions

SIAM Journal on Optimization
α-parallel prior and its properties

IEEE Transactions on Information Theory

Integration of Stochastic Models by Minimizing α-Divergence

Neural Computation
Information Geometry and Its Applications: Convex Function and Dually Flat Manifold

Emerging Trends in Visual Computing
Surrogate regret bounds for proper losses

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Sided and symmetrized Bregman centroids

IEEE Transactions on Information Theory
α-divergence is unique, belonging to both f-divergence and Bregman divergence classes

IEEE Transactions on Information Theory
Composite Binary Losses

The Journal of Machine Learning Research
Information, Divergence and Risk for Binary Experiments

The Journal of Machine Learning Research
Extended SMART algorithms for non-negative matrix factorization

ICAISC'06 Proceedings of the 8th international conference on Artificial Intelligence and Soft Computing
Review: Divergence measures for statistical data processing-An annotated bibliography

Signal Processing

Quantified Score

Hi-index	0.12

Visualization

Abstract

From a smooth, strictly convex function Φ: Rn → R, a parametric family of divergence function DΦ(α) may be introduced: DΦ(α)(x, y) = 4/1 - α2(1 - α/2 Φ(x)+ 1 + α/2 Φ(y) - Φ (1 - α/2x + 1 + α/2y)) for x, y ∈ int dom(Φ) ⊂ Rn, and for α ∈ R, with DΦ(±1) defined through taking the limit of α. Each member is shown to induce an α-independent Riemannian metric, as well as a pair of dual α-connections, which are generally nonflat, except for α = ±1. In the latter case, DΦ(±1) reduces to the (nonparametric) Bregman divergence, which is representable using Φ and its convex conjugate Φ* and becomes the canonical divergence for dually flat spaces (Amari, 1982, 1985; Amari & Nagaoka, 2000). This formulation based on convex analysis naturally extends the information-geometric interpretation of divergence functions (Eguchi, 1983) to allow the distinction between two different kinds of duality: referential duality (α ↔ -α) and representational duality (Φ ↔ Φ*). When applied to (not necessarily normalized) probability densities, the concept of conjugated representations of densities is introduced, so that ±α-connections defined on probability densities embody both referential and representational duality and are hence themselves bidual. When restricted to a finite-dimensional affine submanifold, the natural parameters of a certain representation of densities and the expectation parameters under its conjugate representation form biorthogonal coordinates. The alpha representation (indexed by β now, β ∈ [-1, 1]) is shown to be the only measure-invariant representation. The resulting two-parameter family of divergence functionals D(α,β), (α,β) ∈ [-1, 1] × [-1, 1] induces identical Fisher information but bidual alpha-connection pairs; it reduces in form to Amari's alpha-divergence family when α = ±1 or when β = 1, but to the family of Jensen difference (Rao, 1987) when β = -1.