Divergence function, duality, and convex analysis

  • Authors:
  • Jun Zhang

  • Affiliations:
  • Department of Psychology, University of Michigan, Ann Arbor, MI

  • Venue:
  • Neural Computation
  • Year:
  • 2004

Quantified Score

Hi-index 0.12

Visualization

Abstract

From a smooth, strictly convex function Φ: Rn → R, a parametric family of divergence function DΦ(α) may be introduced: DΦ(α)(x, y) = 4/1 - α2(1 - α/2 Φ(x)+ 1 + α/2 Φ(y) - Φ (1 - α/2x + 1 + α/2y)) for x, y ∈ int dom(Φ) ⊂ Rn, and for α ∈ R, with DΦ(±1) defined through taking the limit of α. Each member is shown to induce an α-independent Riemannian metric, as well as a pair of dual α-connections, which are generally nonflat, except for α = ±1. In the latter case, DΦ(±1) reduces to the (nonparametric) Bregman divergence, which is representable using Φ and its convex conjugate Φ* and becomes the canonical divergence for dually flat spaces (Amari, 1982, 1985; Amari & Nagaoka, 2000). This formulation based on convex analysis naturally extends the information-geometric interpretation of divergence functions (Eguchi, 1983) to allow the distinction between two different kinds of duality: referential duality (α ↔ -α) and representational duality (Φ ↔ Φ*). When applied to (not necessarily normalized) probability densities, the concept of conjugated representations of densities is introduced, so that ±α-connections defined on probability densities embody both referential and representational duality and are hence themselves bidual. When restricted to a finite-dimensional affine submanifold, the natural parameters of a certain representation of densities and the expectation parameters under its conjugate representation form biorthogonal coordinates. The alpha representation (indexed by β now, β ∈ [-1, 1]) is shown to be the only measure-invariant representation. The resulting two-parameter family of divergence functionals D(α,β), (α,β) ∈ [-1, 1] × [-1, 1] induces identical Fisher information but bidual alpha-connection pairs; it reduces in form to Amari's alpha-divergence family when α = ±1 or when β = 1, but to the family of Jensen difference (Rao, 1987) when β = -1.