Fundamentals of speech recognition
Fundamentals of speech recognition
Modeling pronunciation variation for ASR: a survey of the literature
Speech Communication - Special issue on modeling pronunciation variation for automatic speech recognition
Heterogeneous acoustic measurements and multiple classifiers for speech recognition
Heterogeneous acoustic measurements and multiple classifiers for speech recognition
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
Local Discriminant Embedding and Its Variants
CVPR '05 Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Volume 2 - Volume 02
IEEE Transactions on Pattern Analysis and Machine Intelligence
Graph Embedding and Extensions: A General Framework for Dimensionality Reduction
IEEE Transactions on Pattern Analysis and Machine Intelligence
Dimensionality Reduction of Multimodal Labeled Data by Local Fisher Discriminant Analysis
The Journal of Machine Learning Research
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Locality sensitive discriminant analysis
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Neighbor-weighted K-nearest neighbor for unbalanced text corpus
Expert Systems with Applications: An International Journal
Manifold learning-based feature transformation for phone classification
NOLISP'07 Proceedings of the 2007 international conference on Advances in nonlinear speech processing
Trajectory Clustering for Solving the Trajectory Folding Problem in Automatic Speech Recognition
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
Automatic Speech Recognition (ASR) depends crucially on establishing acoustic models for speech units including phones. One disadvantage that lies in popular acoustic models is the lack of modeling speech continuity information. Stacking short-term features of consecutive frames may keep sufficient articulatory information. Unfortunately, the resultant high-dimensional feature space is still full of redundant information and also causes the curse of dimensionality for subsequent acoustic modeling. Motivated by this and some recent research [4, 15], our paper investigates the supervised dimensionality reduction methods to answer two research questions: whether local structures exist in the feature space formulated by stacking frames and whether the local structures help the acoustic modeling. Experimental results by TIMIT phonetic classification show that the assumed local structures do exist in the feature space and could be best described by nearest neighbor graphs.