Comparative experiments on large vocabulary speech recognition

Authors:
Richard Schwartz;Tasos Anastasakos;Francis Kubala;John Makhoul;Long Nguyen;George Zavaliagkos
Affiliations:
BBN Systems & Technologies, Cambridge, MA;BBN Systems & Technologies, Cambridge, MA;BBN Systems & Technologies, Cambridge, MA;BBN Systems & Technologies, Cambridge, MA;BBN Systems & Technologies, Cambridge, MA;BBN Systems & Technologies, Cambridge, MA
Venue:
HLT '93 Proceedings of the workshop on Human Language Technology
Year:
1993

Citing 6
Cited 4

Benchmark tests for the DARPA Spoken Language Program

HLT '93 Proceedings of the workshop on Human Language Technology
Efficient cepstral normalization for robust speech recognition

HLT '93 Proceedings of the workshop on Human Language Technology
The forward-backward search algorithm

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
A comparison of several approximate algorithms for finding multiple (N-best) sentence hypotheses

ICASSP '91 Proceedings of the Acoustics, Speech, and Signal Processing, 1991. ICASSP-91., 1991 International Conference
New uses for the N-best sentence hypotheses within the BYBLOS speech recognition system

ICASSP'92 Proceedings of the 1992 IEEE international conference on Acoustics, speech and signal processing - Volume 1
The estimation of powerful language models from small and large corpora

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: speech processing - Volume II

Benchmark tests for the DARPA Spoken Language Program

HLT '93 Proceedings of the workshop on Human Language Technology
Efficient cepstral normalization for robust speech recognition

HLT '93 Proceedings of the workshop on Human Language Technology
Adaptation to new microphones using tied-mixture normalization

HLT '94 Proceedings of the workshop on Human Language Technology
Signal processing for robust speech recognition

HLT '94 Proceedings of the workshop on Human Language Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes several key experiments in large vocabulary speech recognition. We demonstrate that, counter to our intuitions, given a fixed amount of training speech, the number of training speakers has little effect on the accuracy. We show how much speech is needed for speaker-independent (SI) recognition in order to achieve the same performance as speaker-dependent (SD) recognition. We demonstrate that, though the N-Best Paradigm works quite well up to vocabularies of 5,000 words, it begins to break down with 20,000 words and long sentences. We compare the performance of two feature preprocessing algorithms for microphone independence and we describe a new microphone adaptation algorithm based on selection among several codebook transformations.