Acoustic modeling of subword units for large vocabulary speaker independent speech recognition

Authors:
Chin-Hui Lee;Lawrence R. Rabiner;Roberto Pieraccini;Jay G. Wilpon
Affiliations:
AT&T Bell Laboratories, Murray Hill, NJ;AT&T Bell Laboratories, Murray Hill, NJ;AT&T Bell Laboratories, Murray Hill, NJ;AT&T Bell Laboratories, Murray Hill, NJ
Venue:
HLT '89 Proceedings of the workshop on Speech and Natural Language
Year:
1989

Citing 4
Cited 3

The BBN BYBLOS Continuous Speech Recognition system

HLT '89 Proceedings of the workshop on Speech and Natural Language
Recent progress in the SPHINX Speech Recognition system

HLT '89 Proceedings of the workshop on Speech and Natural Language
The MIT SUMMIT Speech Recognition system: a progress report

HLT '89 Proceedings of the workshop on Speech and Natural Language
Automatic Speech Recognition: The Development of the Sphinx Recognition System

Automatic Speech Recognition: The Development of the Sphinx Recognition System

On the interaction between true source, training, and testing language models

HLT '90 Proceedings of the workshop on Speech and Natural Language
Continuous speech recognition from a phonetic transcription

HLT '90 Proceedings of the workshop on Speech and Natural Language
Distinguishing questions by contour in speech recognition tasks

HLT '89 Proceedings of the workshop on Speech and Natural Language

Quantified Score

Hi-index	0.00

Visualization

Abstract

The field of large vocabulary, continuous speech recognition has advanced to the point where there are several systems capable of attaining between 90 and 95% word accuracy for speaker independent recognition of a 1000 word vocabulary, spoken fluently for a task with a perplexity (average word branching factor) of about 60. There are several factors which account for the high performance achieved by these systems, including the use of hidden Markov models (HMM) for acoustic modeling, the use of context dependent sub-word units, the representation of between-word phonemic variation, and the use of corrective training techniques to emphasize differences between acoustically similar words in the vocabulary. In this paper we describe one of the large vocabulary speech recognition systems which is being developed at AT&T Bell Laboratories, and discuss the methods used to provide high word recognition accuracy. In particular, we focus on the techniques used to obtain acoustic models of the sub-word units (both context independent and context dependent units), and discuss the resulting system performance as a function of the type of acoustic modeling used.