Layering predictions: flexible use of dialog expectation in speech recognition

Authors:
Sheryl R. Young;Wayne H. Ward;Alexander G. Hauptmann
Affiliations:
Carnegie Mellon University, School of Computer Science, Pittsburgh, PA;Carnegie Mellon University, School of Computer Science, Pittsburgh, PA;Carnegie Mellon University, School of Computer Science, Pittsburgh, PA
Venue:
IJCAI'89 Proceedings of the 11th international joint conference on Artificial intelligence - Volume 2
Year:
1989

Citing 10
Cited 0

Attention, intentions, and the structure of discourse

Computational Linguistics
The correction of ill-formed input using history-based expectation with applications to speech understanding

Computational Linguistics
High level knowledge sources in usable speech recognition systems

Communications of the ACM
Trends in Speech Recognition

Trends in Speech Recognition
Understanding goal-based stories.

Understanding goal-based stories.
Large-vocabulary speaker-independent continuous speech recognition: the sphinx system

Large-vocabulary speaker-independent continuous speech recognition: the sphinx system
Interactive natural language problem solving: a pragmatic approach

ANLC '83 Proceedings of the first conference on Applied natural language processing
Parsing spoken language: a semantic caseframe approach

COLING '86 Proceedings of the 11th coference on Computational linguistics
Flexible parsing of discretely uttered sentences

COLING '82 Proceedings of the 9th conference on Computational linguistics - Volume 1
Human Problem Solving

Human Problem Solving

Quantified Score

Hi-index	0.01

Visualization

Abstract

When computer speech recognition is used for problem solving or any plan based task, predictable features of the user's behavior may be inferred and used to aid the recognition of the speech input. The MINDS system generates expectations of what will be said next and uses them to assist speech recognition. Since a user does not always conform to system expectations, MINDS handles violated expectations. We use pragmatic knowledge to dynamically derive constraints about what the user is likely to say next. Then we loosen the constraints in a principled manner to generate layered sets of predictions which range from very specific to very general. To enable the speech system to give priority to recognizing what a user is most likely to say, each prediction set dynamically generates a grammar which is used by the speech recognizer. A different set of grammars is created after each user utterance. The grammars are tried in order of most specific first, until an acceptable parse is found. This allows optimal performance when users behave predictably, and displays graceful degradation when they do not.