Advances in transcription of broadcast news and conversational telephone speech within the combined EARS BBN/LIMSI system

Authors:
S. Matsoukas;J. -L. Gauvain;G. Adda;T. Colthurst;Chia-Lin Kao;O. Kimball;L. Lamel;F. Lefevre;J. Z. Ma;J. Makhoul;L. Nguyen;R. Prasad;R. Schwartz;H. Schwenk;Bing Xiang
Affiliations:
BBN Technol., Cambridge, MA;-;-;-;-;-;-;-;-;-;-;-;-;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2006

Citing 0
Cited 4

The application of hidden Markov models in speech recognition

Foundations and Trends in Signal Processing
Speaker-adaptive speech recognition using speaker diarization for improved transcription of large spoken archives

Speech Communication
Unsupervised training of an HMM-based self-organizing unit recognizer with applications to topic classification and keyword discovery

Computer Speech and Language
Effect of acoustic and linguistic contexts on human and machine speech recognition

Computer Speech and Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes the progress made in the transcription of broadcast news (BN) and conversational telephone speech (CTS) within the combined BBN/LIMSI system from May 2002 to September 2004. During that period, BBN and LIMSI collaborated in an effort to produce significant reductions in the word error rate (WER), as directed by the aggressive goals of the Effective, Affordable, Reusable, Speech-to-text [Defense Advanced Research Projects Agency (DARPA) EARS] program. The paper focuses on general modeling techniques that led to recognition accuracy improvements, as well as engineering approaches that enabled efficient use of large amounts of training data and fast decoding architectures. Special attention is given on efforts to integrate components of the BBN and LIMSI systems, discussing the tradeoff between speed and accuracy for various system combination strategies. Results on the EARS progress test sets show that the combined BBN/LIMSI system achieved relative reductions of 47% and 51% on the BN and CTS domains, respectively