Progress in the CU-HTK broadcast news transcription system

Authors:
M. J.F. Gales;Do Yeong Kim;P. C. Woodland;Ho Yin Chan;D. Mrva;R. Sinha;S. E. Tranter
Affiliations:
Eng. Dept., Cambridge Univ.;-;-;-;-;-;-
Venue:
IEEE Transactions on Audio, Speech, and Language Processing
Year:
2006

Citing 0
Cited 9

Architecture, User Interface, and Enabling Technology in Windows Vista's Speech Systems

IEEE Transactions on Computers
Review: Speaker segmentation and clustering

Signal Processing
The application of hidden Markov models in speech recognition

Foundations and Trends in Signal Processing
Domain Adaptation of a Broadcast News Transcription System for the Portuguese Parliament

PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
Automatic Classification and Transcription of Telephone Speech in Radio Broadcast Data

PROPOR '08 Proceedings of the 8th international conference on Computational Processing of the Portuguese Language
Directed decision trees for generating complementary systems

Speech Communication
Speaker diarization exploiting the eigengap criterion and cluster ensembles

IEEE Transactions on Audio, Speech, and Language Processing
The efficient incorporation of MLP features into automatic speech recognition systems

Computer Speech and Language
A review on speaker diarization systems and approaches

Speech Communication

Quantified Score

Hi-index	0.00

Visualization

Abstract

Broadcast news (BN) transcription has been a challenging research area for many years. In the last couple of years, the availability of large amounts of roughly transcribed acoustic training data and advanced model training techniques has offered the opportunity to greatly reduce the error rate on this task. This paper describes the design and performance of BN transcription systems which make use of these developments. First, the effects of using lightly supervised training data and advanced acoustic modeling techniques are discussed. The design of a real-time broadcast news recognition system is then detailed using these new models. As system combination has been found to yield large gains in performance, a range of frameworks that allow multiple recognition outputs to be combined are next described. These include the use of multiple types of acoustic models and multiple segmentations. As a contrast a system developed by multiple sites allowing cross-site combination, the "SuperEARS" system, is also described. The various models and recognition configurations are evaluated using several recent BN development and evaluation test sets. These new BN transcription systems can give gains of over 25% relative to the CU-HTK 2003 BN system