Collaborating on referring expressions
Computational Linguistics
Making large-scale support vector machine learning practical
Advances in kernel methods
Using Grice's maxim of quantity to select the content of plan descriptions
Artificial Intelligence
International Journal of Human-Computer Studies - Special issue on collaboration, cooperation and conflict in dialogue systems
Evaluating Natural Language Processing Systems: An Analysis and Review
Evaluating Natural Language Processing Systems: An Analysis and Review
Lessons from a failure: generating tailored smoking cessation letters
Artificial Intelligence
“Put-that-there”: Voice and gesture at the graphics interface
SIGGRAPH '80 Proceedings of the 7th annual conference on Computer graphics and interactive techniques
Cooking up referring expressions
ACL '89 Proceedings of the 27th annual meeting on Association for Computational Linguistics
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Robust PCFG-based generation using automatically acquired LFG approximations
ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Generating Referring Expressions: Making Referents Easy to Identify
Computational Linguistics
Intrinsic vs. extrinsic evaluation measures for referring expression generation
HLT-Short '08 Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics on Human Language Technologies: Short Papers
A hearer-oriented evaluation of referring expression generation
ENLG '09 Proceedings of the 12th European Workshop on Natural Language Generation
Report on the first NLG Challenge on Generating Instructions in Virtual Environments (GIVE)
ENLG '09 Proceedings of the 12th European Workshop on Natural Language Generation
Learning content selection rules for generating object descriptions in dialogue
Journal of Artificial Intelligence Research
Choosing words in computer-generated weather forecasts
Artificial Intelligence - Special volume on connecting language to the world
Generating and evaluating evaluative arguments
Artificial Intelligence
Comparing objective and subjective measures of usability in a human-robot dialogue system
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Noun phrase generation for situated dialogs
INLG '06 Proceedings of the Fourth International Natural Language Generation Conference
Towards an extrinsic evaluation of referring expressions in situated dialogs
INLG '10 Proceedings of the 6th International Natural Language Generation Conference
The GREC Challenges 2010: overview and evaluation results
INLG '10 Proceedings of the 6th International Natural Language Generation Conference
Report on the second NLG challenge on generating instructions in virtual environments (GIVE-2)
INLG '10 Proceedings of the 6th International Natural Language Generation Conference
Introducing shared tasks to NLG: the TUNA shared task evaluation challenges
Empirical methods in natural language generation
Generating referring expressions in context: the GREC task evaluation challenges
Empirical methods in natural language generation
Computational generation of referring expressions: A survey
Computational Linguistics
Natural discourse reference generation reduces cognitive load in spoken systems
Natural Language Engineering
Report on the second second challenge on generating instructions in virtual environments (GIVE-2.5)
ENLG '11 Proceedings of the 13th European Workshop on Natural Language Generation
REX-J: Japanese referring expression corpus of situated dialogs
Language Resources and Evaluation
Hi-index | 0.00 |
Appropriate evaluation of referring expressions is critical for the design of systems that can effectively collaborate with humans. A widely used method is to simply evaluate the degree to which an algorithm can reproduce the same expressions as those in previously collected corpora. Several researchers, however, have noted the need of a task-performance evaluation measuring the effectiveness of a referring expression in the achievement of a given task goal. This is particularly important in collaborative situated dialogues. Using referring expressions used by six pairs of Japanese speakers collaboratively solving Tangram puzzles, we conducted a task-performance evaluation of referring expressions with 36 human evaluators. Particularly we focused on the evaluation of demonstrative pronouns generated by a machine learning-based algorithm. Comparing the results of this task-performance evaluation with the results of a previously conducted corpus-matching evaluation (Spanger et al. in Lang Resour Eval, 2010b), we confirmed the limitation of a corpus-matching evaluation and discuss the need for a task-performance evaluation.