The PEACE SLDS understanding evaluation paradigm of the French MEDIA campaign

Authors:
Laurence Devillers;Hélène Maynard;Patrick Paroubek;Sophie Rosset
Affiliations:
University of Paris XI - BP, Orsay Cedex, France;University of Paris XI - BP, Orsay Cedex, France;University of Paris XI - BP, Orsay Cedex, France;University of Paris XI - BP, Orsay Cedex, France
Venue:
Evalinitiatives '03 Proceedings of the EACL 2003 Workshop on Evaluation Initiatives in Natural Language Processing: are evaluation methods, metrics and resources reusable?
Year:
2003

Citing 3
Cited 1

Assessing agreement on classification tasks: the kappa statistic

Computational Linguistics
Quantitative and qualitative evaluation of Darpa Communicator spoken dialogue systems

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
Multi-site data collection for a spoken language corpus

HLT '91 Proceedings of the workshop on Speech and Natural Language

A Framework of Evaluation for Question-Answering Systems

ECIR '09 Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a paradigm for evaluating the context-sensitive understanding capability of any spoken language dialog system: PEACE (French acronym for Paradigme d'Evaluation Automatique de la Compréhension hors et En-contexte). This paradigm will be the basis of the French Technolangue MEDIA project, in which dialog systems from various academic and industrial sites will be tested in an evaluation campaign coordinated by ELRA/ELDA (over the next two years). Despite previous efforts such as Eagles, Disc, Aupelf Arcb2 or the ongoing American Darpa Communicator project, the spoken dialog community still lacks common reference tasks and widely agreed upon methods for comparing and diagnosing systems and techniques. Automatic solutions are nowadays being sought both to make possible the comparison of different approaches by means of reliable indicators with generic evaluation methodologies and also to reduce system development costs. However achieving independence from both the dialog system and the task performed seems to be more and more a utopia. Most of the evaluations have up to now either tackled the system as a whole, or based the measurements on dialog-context-free information. The PEACE proposal aims at bypassing some of these shortcomings by extracting, from real dialog corpora, test sets that synthesize contextual information.