Assessing agreement on classification tasks: the kappa statistic
Computational Linguistics
Quantitative and qualitative evaluation of Darpa Communicator spoken dialogue systems
ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Multi-site data collection for a spoken language corpus
HLT '91 Proceedings of the workshop on Speech and Natural Language
A Framework of Evaluation for Question-Answering Systems
ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval
Hi-index | 0.00 |
This paper presents a paradigm for evaluating the context-sensitive understanding capability of any spoken language dialog system: PEACE (French acronym for Paradigme d'Evaluation Automatique de la Compréhension hors et En-contexte). This paradigm will be the basis of the French Technolangue MEDIA project, in which dialog systems from various academic and industrial sites will be tested in an evaluation campaign coordinated by ELRA/ELDA (over the next two years). Despite previous efforts such as Eagles, Disc, Aupelf Arcb2 or the ongoing American Darpa Communicator project, the spoken dialog community still lacks common reference tasks and widely agreed upon methods for comparing and diagnosing systems and techniques. Automatic solutions are nowadays being sought both to make possible the comparison of different approaches by means of reliable indicators with generic evaluation methodologies and also to reduce system development costs. However achieving independence from both the dialog system and the task performed seems to be more and more a utopia. Most of the evaluations have up to now either tackled the system as a whole, or based the measurements on dialog-context-free information. The PEACE proposal aims at bypassing some of these shortcomings by extracting, from real dialog corpora, test sets that synthesize contextual information.