A study on invariance of f-divergence and its application to speech recognition

  • Authors:
  • Yu Qiao;Nobuaki Minematsu

  • Affiliations:
  • Shenzhen Institute of Advanced Technology, The Chinese Academy of Science, Shenzhen, China;Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan

  • Venue:
  • IEEE Transactions on Signal Processing
  • Year:
  • 2010

Quantified Score

Hi-index 35.68

Visualization

Abstract

Identifying features invariant to certain transformations is a fundamental problem in the fields of signal processing and pattern recognition. This correspondence explores a family of measures called f -divergences that are invariant to invertible transformations, and studies their application to speech recognition.We provide novel proofs for the sufficiency and necessity of the invariance of f -divergence. Several techniques to calculate or approximate f -divergences in general cases and for special distributions such as Gaussian and Gaussian mixture are reviewed. We show how to construct an invariant structural representation from sequence data through maximum likelihood decomposition, and prove the invariance of this decomposition.We demonstrate an application of this invariant representation to recognizing connected Japanese vowel utterances. In addition, we propose several techniques to improve the recognition performance. The experimental results show that the invariant structure achieves better performance than hidden Markov models, a widely used technique for acoustic modeling of speech sounds.