Conditional Random Fields for Integrating Local Discriminative Classifiers

Authors:
J. Morris;E. Fosler-Lussier
Affiliations:
Ohio State Univ., Columbus;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2008

Citing 0
Cited 7

Discriminative input stream combination for conditional random field phone recognition

IEEE Transactions on Audio, Speech, and Language Processing
Applying conditional random fields on Chinese syllable recognition

SMC'09 Proceedings of the 2009 IEEE international conference on Systems, Man and Cybernetics
Investigations into the Crandem approach to word recognition

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Enhanced phone posteriors for improving speech recognition systems

IEEE Transactions on Audio, Speech, and Language Processing
Automatic analysis of Mandarin accented English using phonological features

Speech Communication
Discriminative pronunciation modeling: a large-margin, feature-rich approach

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Fusion of parametric and non-parametric approaches to noise-robust ASR

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Conditional random fields (CRFs) are a statistical framework that has recently gained in popularity in both the automatic speech recognition (ASR) and natural language processing communities because of the different nature of assumptions that are made in predicting sequences of labels compared to the more traditional hidden Markov model (HMM). In the ASR community, CRFs have been employed in a method similar to that of HMMs, using the sufficient statistics of input data to compute the probability of label sequences given acoustic input. In this paper, we explore the application of CRFs to combine local posterior estimates provided by multilayer perceptrons (MLPs) corresponding to the frame-level prediction of phone classes and phonological attribute classes. We compare phonetic recognition using CRFs to an HMM system trained on the same input features and show that the monophone label CRF is able to achieve superior performance to a monophone-based HMM and performance comparable to a 16 Gaussian mixture triphone-based HMM; in both of these cases, the CRF obtains these results with far fewer free parameters. The CRF is also able to better combine these posterior estimators, achieving a substantial increase in performance over an HMM-based triphone system by mixing the two highly correlated sets of phone class and phonetic attribute class posteriors.