A study on invariance of f-divergence and its application to speech recognition

Authors:
Yu Qiao;Nobuaki Minematsu
Affiliations:
Shenzhen Institute of Advanced Technology, The Chinese Academy of Science, Shenzhen, China;Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan
Venue:
IEEE Transactions on Signal Processing
Year:
2010

Citing 4
Cited 2

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
Automatic speech recognition and speech variability: A review

Speech Communication
About distances of discrete distributions satisfying the data processing theorem of information theory

IEEE Transactions on Information Theory
On Divergences and Informations in Statistics and Information Theory

IEEE Transactions on Information Theory

Review: Divergence measures for statistical data processing-An annotated bibliography

Signal Processing
Pattern learning and recognition on statistical manifolds: an information-geometric review

SIMBAD'13 Proceedings of the Second international conference on Similarity-Based Pattern Recognition

Quantified Score

Hi-index	35.68

Visualization

Abstract

Identifying features invariant to certain transformations is a fundamental problem in the fields of signal processing and pattern recognition. This correspondence explores a family of measures called f -divergences that are invariant to invertible transformations, and studies their application to speech recognition.We provide novel proofs for the sufficiency and necessity of the invariance of f -divergence. Several techniques to calculate or approximate f -divergences in general cases and for special distributions such as Gaussian and Gaussian mixture are reviewed. We show how to construct an invariant structural representation from sequence data through maximum likelihood decomposition, and prove the invariance of this decomposition.We demonstrate an application of this invariant representation to recognizing connected Japanese vowel utterances. In addition, we propose several techniques to improve the recognition performance. The experimental results show that the invariant structure achieves better performance than hidden Markov models, a widely used technique for acoustic modeling of speech sounds.