Text-independent speaker identification system based on the histogram of DCT-cepstrum coefficients

Authors:
S. Al-Rawahy;A. Hossen;U. Heute
Affiliations:
Department of Electrical and Computer Engineering, Sultan Qaboos University, Muscat, Oman;Department of Electrical and Computer Engineering, Sultan Qaboos University, Muscat, Oman;Institute for Circuit and System Theory, Faculty of Engineering, University of Kiel, Kiel, Germany
Venue:
International Journal of Knowledge-based and Intelligent Engineering Systems
Year:
2012

Citing 8
Cited 0

Speaker identification and verification using Gaussian mixture speaker models

Speech Communication
Survey of the state of the art in human language technology

Survey of the state of the art in human language technology
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Discrete Time Processing of Speech Signals

Discrete Time Processing of Speech Signals
Tutorial on maximum likelihood estimation

Journal of Mathematical Psychology
A Real-Time Text- Independent Speaker Identification System

ICIAP '03 Proceedings of the 12th International Conference on Image Analysis and Processing
Discrete-time speech signal processing: principles and practice

Discrete-time speech signal processing: principles and practice
Speaker verification: a tutorial

IEEE Communications Magazine

Quantified Score

Hi-index	0.00

Visualization

Abstract

There are several known feature sets for text-independent speaker-identification systems, most of which depend on spectral information. Among these feature sets as a most successful one, there is the set of the Mel-Frequency Cepstrum Coefficients MFCC. This paper introduces a new feature set, namely, the Histogram of the DCT-Cepstrum Coefficients, inspired by the common use of the MFCC, but simpler and faster in computation. A text-independent speaker-identification system based on the DCT-Cepstrum Histogram and Gaussian Mixture Model GMM is implemented. The new feature was tested using speech files from the ELSDSR database and TIMIT corpus. The new feature set managed to achieve high efficiency rates with speaker identification accuracy of 100% on 23 speakers from the ELSDSR database, and 99% on 630 speakers from the TIMIT corpus.