Pitch correlogram clustering for fast speaker identification

Authors:
Nitin Jhanwar;Ajay K. Raina
Affiliations:
Research and Development Division, Danlaw Technologies India Limited, Hyderabad, India;Research and Development Division, Danlaw Technologies India Limited, Hyderabad, India and Department of Electrical and Electronic Engineering, The University of Melbourne, Victoria, Australia
Venue:
EURASIP Journal on Applied Signal Processing
Year:
2004

Citing 6
Cited 2

Algorithms for clustering data

Algorithms for clustering data
Fundamentals of speech recognition

Fundamentals of speech recognition
Independent component analysis: algorithms and applications

Neural Networks
Covariance-Tied Clustering Method In Speaker Identification

ICMI '02 Proceedings of the 4th IEEE International Conference on Multimodal Interfaces
Improving speaker identification in noise by subband processing and decision fusion

Pattern Recognition Letters - Special issue: Audio- and video-based biometric person authentication (AVBPA 2001)
Robust methods of updating model and a priori threshold in speaker verification

ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 01

Online speech/music segmentation based on the variance mean of filter bank energy

EURASIP Journal on Advances in Signal Processing
Acoustic classification and segmentation using modified spectral roll-off and variance-based features

Digital Signal Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Gaussian mixture models (GMMs) are commonly used in text-independent speaker identification systems. However, for large speaker databases, their high computational run-time limits their use in online or real-time speaker identification situations. Two-stage identification systems, in which the database is partitioned into clusters based on some proximity criteria and only a single-cluster GMM is run in every test, have been suggested in literature to speed up the identification process. However, most clustering algorithms used have shown limited success, apparently because the clustering and GMM feature spaces used are derived from similar speech characteristics. This paper presents a new clustering approach based on the concept of a pitch correlogram that captures frame-to-frame pitch variations of a speaker rather than short-time spectral characteristics like cepstral coefficient, spectral slopes, and so forth. The effectiveness of this two-stage identification process is demonstrated on the IVIE corpus of 110 speakers. The overall system achieves a run-time advantage of 500% as well as a 10% reduction of error in overall speaker identification.