An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification

Authors:
Xugang Lu;Jianwu Dang
Affiliations:
Japan Advanced Institute of Science and Technology, 1-1, Asahidai, Nomi, Ishikawa 923-1292, Japan;Japan Advanced Institute of Science and Technology, 1-1, Asahidai, Nomi, Ishikawa 923-1292, Japan
Venue:
Speech Communication
Year:
2008

Citing 3
Cited 7

Elements of information theory

Elements of information theory
Fundamentals of speech recognition

Fundamentals of speech recognition
Speaker identification and verification using Gaussian mixture speaker models

Speech Communication

An overview of text-independent speaker recognition: From features to supervectors

Speech Communication
A non-uniform subband approach to speech-based cognitive load classification

ICICS'09 Proceedings of the 7th international conference on Information, communications and signal processing
Text-independent speaker identification using VQ-HMM model based multiple classifier system

MICAI'10 Proceedings of the 9th Mexican international conference on Artificial intelligence conference on Advances in soft computing: Part II
Investigation of spectral centroid features for cognitive load classification

Speech Communication
Features extracted using frequency-time analysis approach from Nyquist filter bank and Gaussian filter bank for text-independent speaker identification

BioID'11 Proceedings of the COST 2101 European conference on Biometrics and ID management
Robust speaker identification in the presence of car noise

International Journal of Biometrics
Detection of speaker individual information using a phoneme effect suppression method

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

The features used for speech recognition are expected to emphasize linguistic information while suppressing individual differences. For speaker recognition, in contrast, features should preserve individual information and attenuate the linguistic information at the same time. In most studies, however, identical acoustic features are used for the different missions of speaker and speech recognition. In this paper, we first investigated the relationships between the frequency components and the vocal tract based on speech production. We found that the individual information is encoded non-uniformly in different frequency bands of speech sound. Then we adopted statistical Fisher's F-ratio and information-theoretic mutual information measurements to measure the dependencies between frequency components and individual characteristics based on a speaker recognition database (NTT-VR). From the analysis, we not only confirmed the finding of non-uniform distribution of individual information in different frequency bands from the speech production point of view, but also quantified their dependencies. Based on the quantification results, we proposed a new physiological feature which emphasizes individual information for text-independent speaker identification by using a non-uniform subband processing strategy to emphasize the physiological information involved in speech production. The new feature was combined with GMM speaker models and applied to the NTT-VR speaker recognition database. The speaker identification using proposed feature reduced the identification error rate 20.1% compared that with MFCC feature. The experimental results confirmed that emphasizing the features from highly individual-dependent frequency bands is valid for improving speaker recognition performance.