Advances in speech transcription at IBM under the DARPA EARS program

Authors:
S. F. Chen;B. Kingsbury;Lidia Mangu;D. Povey;G. Saon;H. Soltau;G. Zweig
Affiliations:
IBM T. J. Watson Res. Center, Yorktown Heights, NY;-;-;-;-;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2006

Citing 0
Cited 13

The IBM RT07 Evaluation Systems for Speaker Diarization on Lecture Meetings

Multimodal Technologies for Perception of Humans
Effects of real-time transcription on non-native speaker's comprehension in computer-mediated communications

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition

Computer Speech and Language
Effects of automated transcription quality on non-native speakers' comprehension in real-time computer-mediated communication

Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Unsupervised model adaptation using information-theoretic criterion

HLT '10 Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Leveraging word confusion networks for named entity modeling and detection from conversational telephone speech

Speech Communication
Advances in mandarin broadcast speech transcription at IBM under the DARPA GALE program

ISCSLP'06 Proceedings of the 5th international conference on Chinese Spoken Language Processing
Syntactic decision tree LMs: random selection or intelligent design?

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Implicitly intersecting weighted automata using dual decomposition

NAACL HLT '12 Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
Fast syntactic analysis for statistical language modeling via substructure sharing and uptraining

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Revisiting the case for explicit syntactic information in language models

WLM '12 Proceedings of the NAACL-HLT 2012 Workshop: Will We Ever Really Replace the N-gram Model? On the Future of Language Modeling for HLT
Direct construction of compact context-dependency transducers from data

Computer Speech and Language
Converting Neural Network Language Models into Back-off Language Models for Efficient Decoding in Automatic Speech Recognition

IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the technical and system building advances made in IBM's speech recognition technology over the course of the Defense Advanced Research Projects Agency (DARPA) Effective Affordable Reusable Speech-to-Text (EARS) program. At a technical level, these advances include the development of a new form of feature-based minimum phone error training (fMPE), the use of large-scale discriminatively trained full-covariance Gaussian models, the use of septaphone acoustic context in static decoding graphs, and improvements in basic decoding algorithms. At a system building level, the advances include a system architecture based on cross-adaptation and the incorporation of 2100 h of training data in every system component. We present results on English conversational telephony test data from the 2003 and 2004 NIST evaluations. The combination of technical advances and an order of magnitude more training data in 2004 reduced the error rate on the 2003 test set by approximately 21% relative-from 20.4% to 16.1%-over the most accurate system in the 2003 evaluation and produced the most accurate results on the 2004 test sets in every speed category