Task-based evaluation of text summarization using Relevance Prediction

  • Authors:
  • Stacy President Hobson;Bonnie J. Dorr;Christof Monz;Richard Schwartz

  • Affiliations:
  • Department of Computer Science and UMIACS, University of Maryland, College Park, MD 20742, United States;Department of Computer Science and UMIACS, University of Maryland, College Park, MD 20742, United States;Department of Computer Science, Queen Mary, University of London, London E1 4NS, UK;BBN Technologies, Columbia, MD 21046, United States

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This article introduces a new task-based evaluation measure called Relevance Prediction that is a more intuitive measure of an individual's performance on a real-world task than interannotator agreement. Relevance Prediction parallels what a user does in the real world task of browsing a set of documents using standard search tools, i.e., the user judges relevance based on a short summary and then that same user-not an independent user-decides whether to open (and judge) the corresponding document. This measure is shown to be a more reliable measure of task performance than LDC Agreement, a current gold-standard based measure used in the summarization evaluation community. Our goal is to provide a stable framework within which developers of new automatic measures may make stronger statistical statements about the effectiveness of their measures in predicting summary usefulness. We demonstrate-as a proof-of-concept methodology for automatic metric developers-that a current automatic evaluation measure has a better correlation with Relevance Prediction than with LDC Agreement and that the significance level for detected differences is higher for the former than for the latter.