Hierarchical mixtures of experts and the EM algorithm
Neural Computation
Factorial Hidden Markov Models
Machine Learning - Special issue on learning with probabilistic representations
IEEE Transactions on Pattern Analysis and Machine Intelligence
Machine Learning
Multi-stream adaptive evidence combination for noise robust ASR
Speech Communication - Special issue on noise robust ASR
Connectionist speech recognition of Broadcast News
Speech Communication - Special issue on automatic transcription of broadcast news data
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
A multiband approach to automatic speech recognition
A multiband approach to automatic speech recognition
Feature-based pronunciation modeling for automatic speech recognition
Feature-based pronunciation modeling for automatic speech recognition
ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 02
Articulatory feature recognition using dynamic Bayesian networks
Computer Speech and Language
Using multiple acoustic feature sets for speech recognition
Speech Communication
Cross-language information retrieval using PARAFAC2
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Joint versus independent phonological feature models within CRF phone recognition
NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Conditional Random Fields for Integrating Local Discriminative Classifiers
IEEE Transactions on Audio, Speech, and Language Processing
Hi-index | 0.00 |
In recent studies, we and others have found that conditional random fields (CRFs) can be effectively used to perform phone classification and recognition tasks by combining non-Gaussian distributed representations of acoustic input. In previous work by I. Heintz et al. (Latent phonetic analysis: Use of singular value decomposition to determine features for CRF phone recognition, Proc. ICASSP, pp. 4541-4544, 2008), we experimented with combining phonological feature posterior estimators and phone posterior estimators within a CRF framework; we found that treating posterior estimates as terms in a "phoneme information retrieval" task allowed for a more effective use of multiple posterior streams than directly feeding these acoustic representations to the CRF recognizer. In this paper, we examine some of the design choices in our previous work, and extend our results to up to six acoustic feature streams. We concentrate on feature design, rather than feature selection, to find the best way of combining features for introduction into a log-linear model. We improve upon our previous work to find that several different dimensionality reduction techniques (SVD, PARAFAC2, KLT), followed by a nonlinear transform provided by a multilayer perceptron, provides a significant gain in phone recognition accuracy on the TIMIT task.