Fundamentals of speech recognition
Fundamentals of speech recognition
Factorial Hidden Markov Models
Machine Learning - Special issue on learning with probabilistic representations
Multi-stream adaptive evidence combination for noise robust ASR
Speech Communication - Special issue on noise robust ASR
Data-guided model combination by decomposition and aggregation
Machine Learning
Maximum Confidence Hidden Markov Modeling for Face Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence
Factor Analyzed Subspace Modeling and Selection
IEEE Transactions on Audio, Speech, and Language Processing
Audio-visual speech modeling for continuous speech recognition
IEEE Transactions on Multimedia
Hi-index | 0.00 |
This paper presents a novel streamed hidden Markov model (HMM) framework for speech recognition. The factor analysis (FA) principle is adopted to explore the common factors from acoustic features. The streaming regularities in building HMMs are governed by the correlation between cepstral features, which is inherent in common factors. Those features corresponding to the same factor are generated by the identical HMM state. Accordingly, the multiple Markov chains are adopted to characterize the variation trends in different dimensions of cepstral vectors. An FA streamed HMM (FASHMM) method is developed to relax the assumption of standard HMM topology, namely, that all features of a speech frame perform the same state emission. The proposed FASHMM is more flexible than the streamed factorial HMM (SFHMM) where the streaming was empirically determined. To reduce the number of factor loading matrices in FA, we evaluated the similarity between individual matrices to find the optimal solution to parameter clustering of FA models. A new decoding algorithm was presented to perform FASHMM speech recognition. FASHMM carries out the streamed Markov chains for a sequence of multivariate Gaussian mixture observations through the state transitions of the partitioned vectors. In the experiments, the proposed method reduced the recognition error rates significantly when compared with the standard HMM and SFHMM methods.