Speaker recognition utilizing distributed DCT-II based Mel frequency cepstral coefficients and fuzzy vector quantization

Authors:
M. Afzal Hossan;Mark A. Gregory
Affiliations:
RMIT University, Melbourne, Australia;RMIT University, Melbourne, Australia
Venue:
International Journal of Speech Technology
Year:
2013

Citing 9
Cited 0

Comparison of different implementations of MFCC

Journal of Computer Science and Technology
Pitch Synchronous Based Feature Extraction for Noise-Robust Speaker Verification

CISP '08 Proceedings of the 2008 Congress on Image and Signal Processing, Vol. 5 - Volume 05
Differential MFCC and Vector Quantization Used for Real-Time Speaker Recognition System

CISP '08 Proceedings of the 2008 Congress on Image and Signal Processing, Vol. 5 - Volume 05
Bangla Speech Recognition System Using LPC and ANN

ICAPR '09 Proceedings of the 2009 Seventh International Conference on Advances in Pattern Recognition
A Novel Fuzzy-Based Automatic Speaker Clustering Algorithm

ISNN 2009 Proceedings of the 6th International Symposium on Neural Networks: Advances in Neural Networks - Part II
Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods

Automatic Speech and Speaker Recognition: Large Margin and Kernel Methods
Speaker Verification Based on Different Vector Quantization Techniques with Gaussian Mixture Models

NSS '09 Proceedings of the 2009 Third International Conference on Network and System Security
Research on k-means Clustering Algorithm: An Improved k-means Clustering Algorithm

IITSI '10 Proceedings of the 2010 Third International Symposium on Intelligent Information Technology and Security Informatics
Fuzzy Ants and Clustering

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, a new and novel Automatic Speaker Recognition (ASR) system is presented. The new ASR system includes novel feature extraction and vector classification steps utilizing distributed Discrete Cosine Transform (DCT-II) based Mel Frequency Cepstral Coefficients (MFCC) and Fuzzy Vector Quantization (FVQ). The ASR algorithm utilizes an approach based on MFCC to identify dynamic features that are used for Speaker Recognition (SR). A series of experiments were performed utilizing three different feature extraction methods: (1) conventional MFCC; (2) Delta-Delta MFCC (DDMFCC); and (3) DCT-II based DDMFCC. The experiments were then expanded to include four classifiers: (1) FVQ; (2) K-means Vector Quantization (VQ); (3) Linde, Buzo and Gray VQ; and (4) Gaussian Mixed Model (GMM). The combination of DCT-II based MFCC, DMFCC and DDMFCC with FVQ was found to have the lowest Equal Error Rate for the VQ based classifiers. The results found were an improvement over previously reported non-GMM methods and approached the results achieved for the computationally expensive GMM based method. Speaker verification tests carried out highlighted the overall performance improvement for the new ASR system. The National Institute of Standards and Technology Speaker Recognition Evaluation corpora was used to provide speaker source data for the experiments.