A clustering based feature selection method in spectro-temporal domain for speech recognition

Authors:
Nafiseh Esfandian;Farbod Razzazi;Alireza Behrad
Affiliations:
Department of Electrical and Computer Engineering, Islamic Azad University, Science and Research Branch, Tehran, Iran;Department of Electrical and Computer Engineering, Islamic Azad University, Science and Research Branch, Tehran, Iran;Faculty of Engineering, Shahed University, Tehran, Iran
Venue:
Engineering Applications of Artificial Intelligence
Year:
2012

Citing 12
Cited 2

Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Monocular Video Foreground/Background Segmentation by Tracking Spatial-Color Gaussian Mixture Models

WMVC '07 Proceedings of the IEEE Workshop on Motion and Video Computing
Unsupervised cluster discovery using statistics in scale space

Engineering Applications of Artificial Intelligence
A cluster-based wavelet feature extraction method and its application

Engineering Applications of Artificial Intelligence
Improved support vector clustering

Engineering Applications of Artificial Intelligence
Pattern classification models for classifying and indexing audio signals

Engineering Applications of Artificial Intelligence
Robustness of spectro-temporal features against intrinsic and extrinsic variations in automatic speech recognition

Speech Communication
Weighted k-means for density-biased clustering

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
A neural predictive coding feature extraction scheme in DCT domain for phoneme recognition

Neural Computing and Applications
Discrimination of speech from nonspeech based on multiscale spectro-temporal Modulations

IEEE Transactions on Audio, Speech, and Language Processing
Auditory representations of acoustic signals

IEEE Transactions on Information Theory - Part 2
A spatially constrained mixture model for image segmentation

IEEE Transactions on Neural Networks

A hybrid approach combining extreme learning machine and sparse representation for image classification

Engineering Applications of Artificial Intelligence
Beyond cross-domain learning: Multiple-domain nonnegative matrix factorization

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Spectro-temporal representation of speech has become one of the leading signal representation approaches in speech recognition systems in recent years. This representation suffers from high dimensionality of the features space which makes this domain unsuitable for practical speech recognition systems. In this paper, a new clustering based method is proposed for secondary feature selection/extraction in the spectro-temporal domain. In the proposed representation, Gaussian mixture models (GMM) and weighted K-means (WKM) clustering techniques are applied to spectro-temporal domain to reduce the dimensions of the features space. The elements of centroid vectors and covariance matrices of clusters are considered as attributes of the secondary feature vector of each frame. To evaluate the efficiency of the proposed approach, the tests were conducted for new feature vectors on classification of phonemes in main categories of phonemes in TIMIT database. It was shown that by employing the proposed secondary feature vector, a significant improvement was revealed in classification rate of different sets of phonemes comparing with MFCC features. The average achieved improvements in classification rates of voiced plosives comparing to MFCC features is 5.9% using WKM clustering and 6.4% using GMM clustering. The greatest improvement is about 7.4% which is obtained by using WKM clustering in classification of front vowels comparing to MFCC features.