Investigation of supervised dimensionality reduction methods for phonetic classification

Authors:
Heyun Huang;Yang Liu;Lou Boves
Affiliations:
Radboud University Nijmegen, Erasmuslaan, HT, Nijmegen, the Netherlands;The Hong Kong Polytechnic University, Hong Kong, P. R. China;Radboud University Nijmegen, Erasmuslaan, Nijmegen, the Netherlands
Venue:
Proceedings of the Third International Conference on Internet Multimedia Computing and Service
Year:
2011

Citing 14
Cited 0

Fundamentals of speech recognition

Fundamentals of speech recognition
Modelling of phone duration (using the TIMIT database) and its potential benefit for ASR

Speech Communication
Modeling pronunciation variation for ASR: a survey of the literature

Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Heterogeneous acoustic measurements and multiple classifiers for speech recognition

Heterogeneous acoustic measurements and multiple classifiers for speech recognition
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Local Discriminant Embedding and Its Variants

CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
Combining Reconstructive and Discriminative Subspace Methods for Robust Classification and Regression by Subsampling

IEEE Transactions on Pattern Analysis and Machine Intelligence
Graph Embedding and Extensions: A General Framework for Dimensionality Reduction

IEEE Transactions on Pattern Analysis and Machine Intelligence
Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis

The Journal of Machine Learning Research
Feature transformation based on discriminant analysis preserving local structure for speech recognition

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Locality sensitive discriminant analysis

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Neighbor-weighted K-nearest neighbor for unbalanced text corpus

Expert Systems with Applications: An International Journal
Manifold learning-based feature transformation for phone classification

NOLISP'07 Proceedings of the 2007 international conference on Advances in nonlinear speech processing
Trajectory Clustering for Solving the Trajectory Folding Problem in Automatic Speech Recognition

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic Speech Recognition (ASR) depends crucially on establishing acoustic models for speech units including phones. One disadvantage that lies in popular acoustic models is the lack of modeling speech continuity information. Stacking short-term features of consecutive frames may keep sufficient articulatory information. Unfortunately, the resultant high-dimensional feature space is still full of redundant information and also causes the curse of dimensionality for subsequent acoustic modeling. Motivated by this and some recent research [4, 15], our paper investigates the supervised dimensionality reduction methods to answer two research questions: whether local structures exist in the feature space formulated by stacking frames and whether the local structures help the acoustic modeling. Experimental results by TIMIT phonetic classification show that the assumed local structures do exist in the feature space and could be best described by nearest neighbor graphs.