The BBN BYBLOS Continuous Speech Recognition system
HLT '89 Proceedings of the workshop on Speech and Natural Language
Recent progress in the SPHINX Speech Recognition system
HLT '89 Proceedings of the workshop on Speech and Natural Language
The MIT SUMMIT Speech Recognition system: a progress report
HLT '89 Proceedings of the workshop on Speech and Natural Language
Automatic Speech Recognition: The Development of the Sphinx Recognition System
Automatic Speech Recognition: The Development of the Sphinx Recognition System
On the interaction between true source, training, and testing language models
HLT '90 Proceedings of the workshop on Speech and Natural Language
Continuous speech recognition from a phonetic transcription
HLT '90 Proceedings of the workshop on Speech and Natural Language
Distinguishing questions by contour in speech recognition tasks
HLT '89 Proceedings of the workshop on Speech and Natural Language
Hi-index | 0.00 |
The field of large vocabulary, continuous speech recognition has advanced to the point where there are several systems capable of attaining between 90 and 95% word accuracy for speaker independent recognition of a 1000 word vocabulary, spoken fluently for a task with a perplexity (average word branching factor) of about 60. There are several factors which account for the high performance achieved by these systems, including the use of hidden Markov models (HMM) for acoustic modeling, the use of context dependent sub-word units, the representation of between-word phonemic variation, and the use of corrective training techniques to emphasize differences between acoustically similar words in the vocabulary. In this paper we describe one of the large vocabulary speech recognition systems which is being developed at AT&T Bell Laboratories, and discuss the methods used to provide high word recognition accuracy. In particular, we focus on the techniques used to obtain acoustic models of the sub-word units (both context independent and context dependent units), and discuss the resulting system performance as a function of the type of acoustic modeling used.