The hub and spoke paradigm for CSR evaluation

Authors:
Francis Kubala;Jerome Bellegarda;Jordan Cohen;David Pallett;Doug Paul;Mike Phillips;Raja Rajasekaran;Fred Richardson;Michael Riley;Roni Rosenfeld;Bob Roth;Mitch Weintraub
Affiliations:
BBN Systems and Technologies;IBM T. J. Watson Research Center;Institute for Defense Analyses;National Institute of Standards and Technology;MIT Lincoln Laboratory;MIT Laboratory for Computer Science;Texas Instruments;Boston University;AT&T Bell Laboratories;Carnegie Mellon University;Dragon Systems, Inc.;SRI International
Venue:
HLT '94 Proceedings of the workshop on Human Language Technology
Year:
1994

Citing 3
Cited 3

The design for the wall street journal-based CSR corpus

HLT '91 Proceedings of the workshop on Speech and Natural Language
Spontaneous speech collection for the CSR corpus

HLT '91 Proceedings of the workshop on Speech and Natural Language
1993 benchmark tests for the ARPA spoken language program

HLT '94 Proceedings of the workshop on Human Language Technology

Portability issues for speech recognition technologies

HLT '01 Proceedings of the first international conference on Human language technology research
1993 benchmark tests for the ARPA spoken language program

HLT '94 Proceedings of the workshop on Human Language Technology
A hybrid approach to adaptive statistical language modeling

HLT '94 Proceedings of the workshop on Human Language Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we introduce the new paradigm used in the most recent ARPA-sponsored Continuous Speech Recognition (CSR) evaluation and then discuss the important features of the test design.The 1993 CSR evaluation was organized in a novel fashion in an attempt to accomodate research over a broad variety of important problems in CSR while maintaining a clear program-wide research focus. Furthermore, each test component in the evaluation was designed as an experiment to extract as much information as possible from the results.The evaluation was centered around a large vocabulary speaker-independent (SI) baseline test, which was required of every participating site. This test was dubbed the 'Hub' since it was common to all sites and formed the basis for controlled inter-system comparisons.The Hub test was augmented with a variety of problem-specific optional tests designed to explore a variety of important problems in CSR, mostly involving some kind of mismatch between the training and test conditions. These tests were known as the 'Spokes' since they all could be informatively compared to the Hub, but were otherwise independent.In the first trial of this evaluation paradigm in November, 1993, 11 research groups participated, yielding a rich array of comparative and contrastive results, all calibrated to the current state of the art in large vocabulary CSR.