Acoustic factor analysis for streamed hidden Markov modeling

Authors:
Jen-Tzung Chien;Chuan-Wei Ting
Affiliations:
Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan;Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2009

Citing 8
Cited 0

Fundamentals of speech recognition

Fundamentals of speech recognition
Factorial Hidden Markov Models

Machine Learning - Special issue on learning with probabilistic representations
Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition

Speech Communication
Multi-stream adaptive evidence combination for noise robust ASR

Speech Communication - Special issue on noise robust ASR
Data-guided model combination by decomposition and aggregation

Machine Learning
Maximum Confidence Hidden Markov Modeling for Face Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence
Factor Analyzed Subspace Modeling and Selection

IEEE Transactions on Audio, Speech, and Language Processing
Audio-visual speech modeling for continuous speech recognition

IEEE Transactions on Multimedia

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a novel streamed hidden Markov model (HMM) framework for speech recognition. The factor analysis (FA) principle is adopted to explore the common factors from acoustic features. The streaming regularities in building HMMs are governed by the correlation between cepstral features, which is inherent in common factors. Those features corresponding to the same factor are generated by the identical HMM state. Accordingly, the multiple Markov chains are adopted to characterize the variation trends in different dimensions of cepstral vectors. An FA streamed HMM (FASHMM) method is developed to relax the assumption of standard HMM topology, namely, that all features of a speech frame perform the same state emission. The proposed FASHMM is more flexible than the streamed factorial HMM (SFHMM) where the streaming was empirically determined. To reduce the number of factor loading matrices in FA, we evaluated the similarity between individual matrices to find the optimal solution to parameter clustering of FA models. A new decoding algorithm was presented to perform FASHMM speech recognition. FASHMM carries out the streamed Markov chains for a sequence of multivariate Gaussian mixture observations through the state transitions of the partitioned vectors. In the experiments, the proposed method reduced the recognition error rates significantly when compared with the standard HMM and SFHMM methods.