Selecting feature frames for automatic speaker recognition using mutual information

Authors:
Chi-Sang Jung;Moo Young Kim;Hong-Goo Kang
Affiliations:
School of Electrical and Electronic Engineering, Biometrics Engineering Research Center, Yonsei University, Seoul, Korea;Department of Information and Communication Engineering, Biometrics Engineering Research Center, Sejong University, Seoul, Korea;School of Electrical and Electronic Engineering, Yonsei University, Seoul, Korea
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2010

Citing 7
Cited 1

Elements of information theory

Elements of information theory
Statistical Pattern Recognition: A Review

IEEE Transactions on Pattern Analysis and Machine Intelligence
Input Feature Selection by Mutual Information Based on Parzen Window

IEEE Transactions on Pattern Analysis and Machine Intelligence
Minimum Redundancy Feature Selection from Microarray Gene Expression Data

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy

IEEE Transactions on Pattern Analysis and Machine Intelligence
Compensation of Nuisance Factors for Speaker and Language Recognition

IEEE Transactions on Audio, Speech, and Language Processing
Using Broad Phonetic Group Experts for Improved Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing

Spectral entropy and spectral shape based pre-quantization for real time speaker identification system

International Journal of Speech Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, an information theoretic approach to selecting feature frames for speaker recognition systems is proposed. A conventional approach in which the frame shift is fixed to around half of the frame length may not be the best choice, because the characteristics of the speech signal may rapidly change, especially at phonetic boundaries. Experimental results show that the recognition accuracy increases if the frame interval is directly controlled using phonetic information. By applying these results to the well-known fact that the recognition accuracy is directly correlated with the amount of mutual information, this paper suggests a novel feature frame selection method for speaker recognition. Specifically, feature frames are chosen to have minimum-redundancy within selected feature frames, but maximum-relevancy to speaker models. It is verified by experiments that the proposed method produces consistent improvement, especially in a speaker verification system. It is also robust against variations in acoustic environment.