A task-performance evaluation of referring expressions in situated collaborative task dialogues

Authors:
Philipp Spanger;Ryu Iida;Takenobu Tokunaga;Asuka Terai;Naoko Kuriyama
Affiliations:
Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan;Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan;Department of Computer Science, Tokyo Institute of Technology, Tokyo, Japan;Department of Human System Science, Tokyo Institute of Technology, Tokyo, Japan;Department of Human System Science, Tokyo Institute of Technology, Tokyo, Japan
Venue:
Language Resources and Evaluation
Year:
2013

Citing 29
Cited 0

Collaborating on referring expressions

Computational Linguistics
Making large-scale support vector machine learning practical

Advances in kernel methods
Using Grice's maxim of quantity to select the content of plan descriptions

Artificial Intelligence
The agreement process: an empirical investigation of human—human computer-mediated collaborative dialogs

International Journal of Human-Computer Studies - Special issue on collaboration, cooperation and conflict in dialogue systems
Evaluating Natural Language Processing Systems: An Analysis and Review

Evaluating Natural Language Processing Systems: An Analysis and Review
Lessons from a failure: generating tailored smoking cessation letters

Artificial Intelligence
“Put-that-there”: Voice and gesture at the graphics interface

SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
Cooking up referring expressions

ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Robust PCFG-based generation using automatically acquired LFG approximations

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Generating Referring Expressions: Making Referents Easy to Identify

Computational Linguistics
Intrinsic vs. extrinsic evaluation measures for referring expression generation

HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
A hearer-oriented evaluation of referring expression generation

ENLG '09 Proceedings of the 12th European Workshop on Natural Language Generation
Report on the first NLG Challenge on Generating Instructions in Virtual Environments (GIVE)

ENLG '09 Proceedings of the 12th European Workshop on Natural Language Generation
Learning content selection rules for generating object descriptions in dialogue

Journal of Artificial Intelligence Research
Choosing words in computer-generated weather forecasts

Artificial Intelligence - Special volume on connecting language to the world
Generating and evaluating evaluative arguments

Artificial Intelligence
An investigation into the validity of some metrics for automatically evaluating natural language generation systems

Computational Linguistics
Comparing objective and subjective measures of usability in a human-robot dialogue system

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Noun phrase generation for situated dialogs

INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
Towards an extrinsic evaluation of referring expressions in situated dialogs

INLG '10 Proceedings of the 6th International Natural Language Generation Conference
The GREC Challenges 2010: overview and evaluation results

INLG '10 Proceedings of the 6th International Natural Language Generation Conference
Report on the second NLG challenge on generating instructions in virtual environments (GIVE-2)

INLG '10 Proceedings of the 6th International Natural Language Generation Conference
Introducing shared tasks to NLG: the TUNA shared task evaluation challenges

Empirical methods in natural language generation
Generating referring expressions in context: the GREC task evaluation challenges

Empirical methods in natural language generation
Computational generation of referring expressions: A survey

Computational Linguistics
Natural discourse reference generation reduces cognitive load in spoken systems

Natural Language Engineering
Report on the second second challenge on generating instructions in virtual environments (GIVE-2.5)

ENLG '11 Proceedings of the 13th European Workshop on Natural Language Generation
REX-J: Japanese referring expression corpus of situated dialogs

Language Resources and Evaluation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Appropriate evaluation of referring expressions is critical for the design of systems that can effectively collaborate with humans. A widely used method is to simply evaluate the degree to which an algorithm can reproduce the same expressions as those in previously collected corpora. Several researchers, however, have noted the need of a task-performance evaluation measuring the effectiveness of a referring expression in the achievement of a given task goal. This is particularly important in collaborative situated dialogues. Using referring expressions used by six pairs of Japanese speakers collaboratively solving Tangram puzzles, we conducted a task-performance evaluation of referring expressions with 36 human evaluators. Particularly we focused on the evaluation of demonstrative pronouns generated by a machine learning-based algorithm. Comparing the results of this task-performance evaluation with the results of a previously conducted corpus-matching evaluation (Spanger et al. in Lang Resour Eval, 2010b), we confirmed the limitation of a corpus-matching evaluation and discuss the need for a task-performance evaluation.