Automatically evaluating answers to definition questions

Authors:
Jimmy Lin;Dina Demner-Fushman
Affiliations:
University of Maryland, College Park, MD;University of Maryland, College Park, MD
Venue:
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Year:
2005

Citing 8
Cited 23

Variations in relevance judgments and the measurement of retrieval effectiveness

Information Processing and Management: an International Journal
Evaluation of an extraction-based approach to answering definitional questions

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
BLEU: a method for automatic evaluation of machine translation

ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic evaluation of summaries using N-gram co-occurrence statistics

NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Question answering using constraint satisfaction: QA-by-Dossier-with-Constraints

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A unified framework for automatic evaluation using N-gram co-occurrence statistics

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Extending the BLEU MT evaluation method with frequency weightings

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
ORANGE: a method for evaluating automatic evaluation metrics for machine translation

COLING '04 Proceedings of the 20th international conference on Computational Linguistics

Probabilistic model for definitional question answering

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Combining deep linguistics analysis and surface pattern learning: a hybrid approach to Chinese definitional question answering

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Nuggeteer: automatic nugget-based evaluation using descriptions and judgements

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Will pyramids built of nuggets topple over?

HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
User simulations for evaluating answers to question series

Information Processing and Management: an International Journal
An exploration of the principles underlying redundancy-based factoid question answering

ACM Transactions on Information Systems (TOIS)
Soft pattern matching models for definitional question answering

ACM Transactions on Information Systems (TOIS)
Answer extraction and ranking strategies for definitional question answering using linguistic features and definition terminology

Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
Answering Clinical Questions with Knowledge-Based and Statistical Techniques

Computational Linguistics
The role of information retrieval in answering complex questions

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Utility-based information distillation over temporally sequenced documents

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Deconstructing nuggets: the stability and reliability of complex question answering evaluation

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Interesting nuggets and their impact on definitional question answering

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Task-based evaluation of text summarization using Relevance Prediction

Information Processing and Management: an International Journal
Open-domain question: answering

Foundations and Trends in Information Retrieval
Answering Definition Question: Ranking for Top-k

Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
A semi-automatic evaluation scheme: automated nuggetization for manual annotation

NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Text comparison using machine-generated nuggets

NAACL-Demonstrations '07 Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
An Unsupervised Model of Exploiting the Web to Answer Definitional Questions

WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Using negative voting to diversify answers in non-factoid question answering

Proceedings of the 18th ACM conference on Information and knowledge management
IR system evaluation using nugget-based test collections

Proceedings of the fifth ACM international conference on Web search and data mining
Answer diversification for complex question answering on the web

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Constructing test collections by inferring document relevance via extracted relevant information

Proceedings of the 21st ACM international conference on Information and knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Following recent developments in the automatic evaluation of machine translation and document summarization, we present a similar approach, implemented in a measure called Pourpre, for automatically evaluating answers to definition questions. Until now, the only way to assess the correctness of answers to such questions involves manual determination of whether an information nugget appears in a system's response. The lack of automatic methods for scoring system output is an impediment to progress in the field, which we address with this work. Experiments with the TREC 2003 and TREC 2004 QA tracks indicate that rankings produced by our metric correlate highly with official rankings, and that Pourpre outperforms direct application of existing metrics.