Predicting information seeker satisfaction in community question answering
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Modeling information-seeker satisfaction in community question answering
ACM Transactions on Knowledge Discovery from Data (TKDD)
Scientific paper summarization using citation summary networks
COLING '08 Proceedings of the 22nd International Conference on Computational Linguistics - Volume 1
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Using citations to generate surveys of scientific paradigms
NAACL '09 Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics
The measurement of user satisfaction with question answering systems
Information and Management
Summarizing definition from Wikipedia
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 1 - Volume 1
Supporting synchronous social q&a throughout the question lifecycle
Proceedings of the 20th international conference on World wide web
Compositional question answering: A divide and conquer approach
Information Processing and Management: an International Journal
Click the search button and be happy: evaluating direct and immediate information access
Proceedings of the 20th ACM international conference on Information and knowledge management
Exploring semi-automatic nugget extraction for Japanese one click access evaluation
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Generating extractive summaries of scientific paradigms
Journal of Artificial Intelligence Research
Hi-index | 0.00 |
Evaluation is a major driving force in advancing the state of the art in language technologies. In particular, methods for automatically assessing the quality of machine output is the preferred method for measuring progress, provided that these metrics have been validated against human judgments. Following recent developments in the automatic evaluation of machine translation and document summarization, we present a similar approach, implemented in a measure called POURPRE, an automatic technique for evaluating answers to complex questions based on n-gram co-occurrences between machine output and a human-generated answer key. Until now, the only way to assess the correctness of answers to such questions involves manual determination of whether an information "nugget" appears in a system's response. The lack of automatic methods for scoring system output is an impediment to progress in the field, which we address with this work. Experiments with the TREC 2003, TREC 2004, and TREC 2005 QA tracks indicate that rankings produced by our metric correlate highly with official rankings, and that POURPRE outperforms direct application of existing metrics.