The hub and spoke paradigm for CSR evaluation

  • Authors:
  • Francis Kubala;Jerome Bellegarda;Jordan Cohen;David Pallett;Doug Paul;Mike Phillips;Raja Rajasekaran;Fred Richardson;Michael Riley;Roni Rosenfeld;Bob Roth;Mitch Weintraub

  • Affiliations:
  • BBN Systems and Technologies;IBM T. J. Watson Research Center;Institute for Defense Analyses;National Institute of Standards and Technology;MIT Lincoln Laboratory;MIT Laboratory for Computer Science;Texas Instruments;Boston University;AT&T Bell Laboratories;Carnegie Mellon University;Dragon Systems, Inc.;SRI International

  • Venue:
  • HLT '94 Proceedings of the workshop on Human Language Technology
  • Year:
  • 1994

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we introduce the new paradigm used in the most recent ARPA-sponsored Continuous Speech Recognition (CSR) evaluation and then discuss the important features of the test design.The 1993 CSR evaluation was organized in a novel fashion in an attempt to accomodate research over a broad variety of important problems in CSR while maintaining a clear program-wide research focus. Furthermore, each test component in the evaluation was designed as an experiment to extract as much information as possible from the results.The evaluation was centered around a large vocabulary speaker-independent (SI) baseline test, which was required of every participating site. This test was dubbed the 'Hub' since it was common to all sites and formed the basis for controlled inter-system comparisons.The Hub test was augmented with a variety of problem-specific optional tests designed to explore a variety of important problems in CSR, mostly involving some kind of mismatch between the training and test conditions. These tests were known as the 'Spokes' since they all could be informatively compared to the Hub, but were otherwise independent.In the first trial of this evaluation paradigm in November, 1993, 11 research groups participated, yielding a rich array of comparative and contrastive results, all calibrated to the current state of the art in large vocabulary CSR.