Variations in relevance judgments and the measurement of retrieval effectiveness
Information Processing and Management: an International Journal
Evaluation of an extraction-based approach to answering definitional questions
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
BLEU: a method for automatic evaluation of machine translation
ACL '02 Proceedings of the 40th Annual Meeting on Association for Computational Linguistics
Automatic evaluation of summaries using N-gram co-occurrence statistics
NAACL '03 Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology - Volume 1
Question answering using constraint satisfaction: QA-by-Dossier-with-Constraints
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
A unified framework for automatic evaluation using N-gram co-occurrence statistics
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Extending the BLEU MT evaluation method with frequency weightings
ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
ORANGE: a method for evaluating automatic evaluation metrics for machine translation
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Probabilistic model for definitional question answering
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Nuggeteer: automatic nugget-based evaluation using descriptions and judgements
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Will pyramids built of nuggets topple over?
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
User simulations for evaluating answers to question series
Information Processing and Management: an International Journal
An exploration of the principles underlying redundancy-based factoid question answering
ACM Transactions on Information Systems (TOIS)
Soft pattern matching models for definitional question answering
ACM Transactions on Information Systems (TOIS)
Information Processing and Management: an International Journal - Special issue: AIRS2005: Information retrieval research in Asia
Answering Clinical Questions with Knowledge-Based and Statistical Techniques
Computational Linguistics
The role of information retrieval in answering complex questions
COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Utility-based information distillation over temporally sequenced documents
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Deconstructing nuggets: the stability and reliability of complex question answering evaluation
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Interesting nuggets and their impact on definitional question answering
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Task-based evaluation of text summarization using Relevance Prediction
Information Processing and Management: an International Journal
Open-domain question: answering
Foundations and Trends in Information Retrieval
Answering Definition Question: Ranking for Top-k
Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
A semi-automatic evaluation scheme: automated nuggetization for manual annotation
NAACL-Short '07 Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers
Text comparison using machine-generated nuggets
NAACL-Demonstrations '07 Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations
An Unsupervised Model of Exploiting the Web to Answer Definitional Questions
WI-IAT '09 Proceedings of the 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Volume 01
Using negative voting to diversify answers in non-factoid question answering
Proceedings of the 18th ACM conference on Information and knowledge management
IR system evaluation using nugget-based test collections
Proceedings of the fifth ACM international conference on Web search and data mining
Answer diversification for complex question answering on the web
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Constructing test collections by inferring document relevance via extracted relevant information
Proceedings of the 21st ACM international conference on Information and knowledge management
Hi-index | 0.00 |
Following recent developments in the automatic evaluation of machine translation and document summarization, we present a similar approach, implemented in a measure called Pourpre, for automatically evaluating answers to definition questions. Until now, the only way to assess the correctness of answers to such questions involves manual determination of whether an information nugget appears in a system's response. The lack of automatic methods for scoring system output is an impediment to progress in the field, which we address with this work. Experiments with the TREC 2003 and TREC 2004 QA tracks indicate that rankings produced by our metric correlate highly with official rankings, and that Pourpre outperforms direct application of existing metrics.