Discriminative input stream combination for conditional random field phone recognition

Authors:
Ilana Heintz;Eric Fosler-Lussier;Chris Brew
Affiliations:
Department of Linguistics and Department of Computer Science and Engineering, The Ohio State University, Columbus, OH;Department of Linguistics and Department of Computer Science and Engineering, The Ohio State University, Columbus, OH;Department of Linguistics and Department of Computer Science and Engineering, The Ohio State University, Columbus, OH
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2009

Citing 15
Cited 0

Hierarchical mixtures of experts and the EM algorithm

Neural Computation
Factorial Hidden Markov Models

Machine Learning - Special issue on learning with probabilistic representations
PCA versus LDA

IEEE Transactions on Pattern Analysis and Machine Intelligence
Cascade Generalization

Machine Learning
Multi-stream adaptive evidence combination for noise robust ASR

Speech Communication - Special issue on noise robust ASR
Connectionist speech recognition of Broadcast News

Speech Communication - Special issue on automatic transcription of broadcast news data
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A multiband approach to automatic speech recognition

A multiband approach to automatic speech recognition
Feature-based pronunciation modeling for automatic speech recognition

Feature-based pronunciation modeling for automatic speech recognition
Dynamic classifier combination in hybrid speech recognition systems using utterance-level confidence values

ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Articulatory feature recognition using dynamic Bayesian networks

Computer Speech and Language
Using multiple acoustic feature sets for speech recognition

Speech Communication
Cross-language information retrieval using PARAFAC2

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Joint versus independent phonological feature models within CRF phone recognition

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Conditional Random Fields for Integrating Local Discriminative Classifiers

IEEE Transactions on Audio, Speech, and Language Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent studies, we and others have found that conditional random fields (CRFs) can be effectively used to perform phone classification and recognition tasks by combining non-Gaussian distributed representations of acoustic input. In previous work by I. Heintz et al. (Latent phonetic analysis: Use of singular value decomposition to determine features for CRF phone recognition, Proc. ICASSP, pp. 4541-4544, 2008), we experimented with combining phonological feature posterior estimators and phone posterior estimators within a CRF framework; we found that treating posterior estimates as terms in a "phoneme information retrieval" task allowed for a more effective use of multiple posterior streams than directly feeding these acoustic representations to the CRF recognizer. In this paper, we examine some of the design choices in our previous work, and extend our results to up to six acoustic feature streams. We concentrate on feature design, rather than feature selection, to find the best way of combining features for introduction into a log-linear model. We improve upon our previous work to find that several different dimensionality reduction techniques (SVD, PARAFAC2, KLT), followed by a nonlinear transform provided by a multilayer perceptron, provides a significant gain in phone recognition accuracy on the TIMIT task.