An ecologically valid evaluation of speech summarization

  • Authors:
  • Anthony McCallum;Cosmin Munteanu;Gerald Penn;Xiaodan Zhu

  • Affiliations:
  • University of Toronto, Toronto, Ontario, Canada;National Research Council Canada & University of Toronto, Fredericton, New Brunswick, Canada;University of Toronto, Toronto, Ontario, Canada;National Research Council Canada, Ottawa, Ontario, Canada

  • Venue:
  • CHI '12 Extended Abstracts on Human Factors in Computing Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The past decade has witnessed an explosion in the size and availability of online audio-visual repositories, such as entertainment, news, or lectures. Summarization systems have the potential to provide significant assistance with navigating such repositories. Unfortunately, automatically-generated summaries often fall short of delivering the information needed by users. This is due, in no small part, to the fact that the natural language heuristics used to generate summaries are often optimized with respect to currently-used evaluation metrics. Such metrics simply score automatically-generated summaries against subjectively-classified gold standards without taking into account the usefulness of a summary in assisting a user achieve a certain goal or even overall summary coherence. We have previously shown that an immediate consequence of this problem is that even the most linguistically-complex summarization systems perform no better than basic heuristics, such as picking the longest sentences from a general-topic, spontaneous dialog, or the first few sentences from a news recording. Our hypothesis is that complex systems are in fact better, if measured properly. What is thus needed instead are evaluation metrics (and consequently, automatic summarizers) that incorporate features such as user preferences and task-orientation. For this, we propose an ecologically valid evaluation metric that determines the value of a summary when embedded in a task, rather than how closely a summary matches a gold standard.