A template matcher for robust NL interpretation
HLT '91 Proceedings of the workshop on Speech and Natural Language
Interactive problem solving and dialogue in the ATIS domain
HLT '91 Proceedings of the workshop on Speech and Natural Language
Evaluation of spoken language systems: the ATIS domain
HLT '90 Proceedings of the workshop on Speech and Natural Language
The ATIS spoken language systems pilot corpus
HLT '90 Proceedings of the workshop on Speech and Natural Language
Developing an evaluation methodology for spoken language systems
HLT '90 Proceedings of the workshop on Speech and Natural Language
Beyond class A: a proposal for automatic evaluation of discourse
HLT '90 Proceedings of the workshop on Speech and Natural Language
Designing the human machine interface in the ATIS domain
HLT '90 Proceedings of the workshop on Speech and Natural Language
Data collection and analysis in the air travel planning domain
HLT '89 Proceedings of the workshop on Speech and Natural Language
Overview of the fifth DARPA speech and natural language workshop
HLT '91 Proceedings of the workshop on Speech and Natural Language
Experiments in evaluating interactive spoken language systems
HLT '91 Proceedings of the workshop on Speech and Natural Language
HLT '91 Proceedings of the workshop on Speech and Natural Language
Spontaneous speech effects in large vocabulary speech recognition applications
HLT '91 Proceedings of the workshop on Speech and Natural Language
Quantitative and qualitative evaluation of Darpa Communicator spoken dialogue systems
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
A new taxonomy for the quality of telephone services based on spoken dialogue systems
SIGDIAL '02 Proceedings of the 3rd SIGdial workshop on Discourse and dialogue - Volume 2
ISDS '97 Interactive Spoken Dialog Systems on Bringing Speech and NLP Together in Real Applications
CIG'09 Proceedings of the 5th international conference on Computational Intelligence and Games
Hi-index | 0.01 |
The DARPA Spoken Language effort has profited greatly from its emphasis on tasks and common evaluation metrics. Common, standardized evaluation procedures have helped the community to focus research effort, to measure progress, and to encourage communication among participating sites. The task and the evaluation metrics, however, must be consistent with the goals of the Spoken Language program, namely interactive problem solving. Our evaluation methods have evolved with the technology, moving from evaluation of read speech from a fixed corpus through evaluation of isolated canned sentences to evaluation of spontaneous speech in context in a canned corpus. A key component missed in current evaluations is the role of subject interaction with the system. Because of the great variability across subjects, however, it is necessary to use either a large number of subjects or a within-subject design. This paper proposes a within-subject design comparing the results of a software-sharing exercise carried out jointly by MIT and SRI.