Training data selection for improving discriminative training of acoustic models

Authors:
Berlin Chen;Shih-Hung Liu;Fang-Hui Chu
Affiliations:
Department of Computer Science and Information Engineering, National Taiwan Normal University, Taipei 116, Taiwan;Department of Computer Science and Information Engineering, National Taiwan Normal University, Taipei 116, Taiwan;Department of Computer Science and Information Engineering, National Taiwan Normal University, Taipei 116, Taiwan
Venue:
Pattern Recognition Letters
Year:
2009

Citing 11
Cited 2

Hidden Markov models, maximum mutual information estimation, and the speech recognition problem

Hidden Markov models, maximum mutual information estimation, and the speech recognition problem
The nature of statistical learning theory

The nature of statistical learning theory
Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition

Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition
Maximum likelihood discriminant feature spaces

ICASSP '00 Proceedings of the Acoustics, Speech, and Signal Processing, 2000. on IEEE International Conference - Volume 02
Large-Margin Discriminative Training of Hidden Markov Models for Speech Recognition

ICSC '07 Proceedings of the International Conference on Semantic Computing
Large-margin minimum classification error training: A theoretical risk minimization perspective

Computer Speech and Language
Empirical error rate minimization based linear discriminant analysis

ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Soft margin estimation for automatic speech recognition

Soft margin estimation for automatic speech recognition
Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error

IEEE Transactions on Audio, Speech, and Language Processing
Large margin hidden Markov models for speech recognition

IEEE Transactions on Audio, Speech, and Language Processing
Approximate Test Risk Bound Minimization Through Soft Margin Estimation

IEEE Transactions on Audio, Speech, and Language Processing

Leveraging relevance cues for language modeling in speech recognition

Information Processing and Management: an International Journal
Minimum-risk training for semi-Markov conditional random fields with application to handwritten Chinese/Japanese text recognition

Pattern Recognition

Quantified Score

Hi-index	0.10

Visualization

Abstract

This paper considers training data selection for discriminative training of acoustic models for large vocabulary continuous speech recognition (LVCSR). Three novel data selection approaches are proposed. First, the average phone accuracy over all hypothesized word sequences in the word lattice of a training utterance is utilized for utterance-level data selection. Second, phone-level data selection based on the difference between the expected accuracy of a phone arc and the average phone accuracy of the word lattice is investigated. Finally, frame-level data selection based on the normalized frame-level entropy of Gaussian posterior probabilities obtained from the word lattice is explored. The underlying characteristics of the presented approaches are extensively investigated and their performance is verified by comparison with standard discriminative training approaches. Experiments conducted on a broadcast news speech transcription task show that with the aid of phone- and frame-level data selection we can reduce more than half of the turnaround time for acoustic model training and simultaneously obtain a comparably good set of discriminative acoustic models.