Dualistic geometry of the manifold of higher-order neurons
Neural Networks
Measurements of generalisation based on information geometry
MANNA '95 Proceedings of the first international conference on Mathematics of neural networks : models, algorithms and applications: models, algorithms and applications
Convergence of the wake-sleep algorithm
Proceedings of the 1998 conference on Advances in neural information processing systems II
Robust blind source separation by beta divergence
Neural Computation
SIAM Journal on Optimization
α-parallel prior and its properties
IEEE Transactions on Information Theory
Integration of Stochastic Models by Minimizing α-Divergence
Neural Computation
Information Geometry and Its Applications: Convex Function and Dually Flat Manifold
Emerging Trends in Visual Computing
Surrogate regret bounds for proper losses
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Sided and symmetrized Bregman centroids
IEEE Transactions on Information Theory
α-divergence is unique, belonging to both f-divergence and Bregman divergence classes
IEEE Transactions on Information Theory
The Journal of Machine Learning Research
Information, Divergence and Risk for Binary Experiments
The Journal of Machine Learning Research
Extended SMART algorithms for non-negative matrix factorization
ICAISC'06 Proceedings of the 8th international conference on Artificial Intelligence and Soft Computing
Hi-index | 0.12 |
From a smooth, strictly convex function Φ: Rn → R, a parametric family of divergence function DΦ(α) may be introduced: DΦ(α)(x, y) = 4/1 - α2(1 - α/2 Φ(x)+ 1 + α/2 Φ(y) - Φ (1 - α/2x + 1 + α/2y)) for x, y ∈ int dom(Φ) ⊂ Rn, and for α ∈ R, with DΦ(±1) defined through taking the limit of α. Each member is shown to induce an α-independent Riemannian metric, as well as a pair of dual α-connections, which are generally nonflat, except for α = ±1. In the latter case, DΦ(±1) reduces to the (nonparametric) Bregman divergence, which is representable using Φ and its convex conjugate Φ* and becomes the canonical divergence for dually flat spaces (Amari, 1982, 1985; Amari & Nagaoka, 2000). This formulation based on convex analysis naturally extends the information-geometric interpretation of divergence functions (Eguchi, 1983) to allow the distinction between two different kinds of duality: referential duality (α ↔ -α) and representational duality (Φ ↔ Φ*). When applied to (not necessarily normalized) probability densities, the concept of conjugated representations of densities is introduced, so that ±α-connections defined on probability densities embody both referential and representational duality and are hence themselves bidual. When restricted to a finite-dimensional affine submanifold, the natural parameters of a certain representation of densities and the expectation parameters under its conjugate representation form biorthogonal coordinates. The alpha representation (indexed by β now, β ∈ [-1, 1]) is shown to be the only measure-invariant representation. The resulting two-parameter family of divergence functionals D(α,β), (α,β) ∈ [-1, 1] × [-1, 1] induces identical Fisher information but bidual alpha-connection pairs; it reduces in form to Amari's alpha-divergence family when α = ±1 or when β = 1, but to the family of Jensen difference (Rao, 1987) when β = -1.