Evaluation of automatic summaries: metrics under varying data conditions

  • Authors:
  • Karolina Owczarzak;Hoa Trang Dang

  • Affiliations:
  • National Institute of Standards and Technology, Gaithersburg, MD;National Institute of Standards and Technology, Gaithersburg, MD

  • Venue:
  • UCNLG+Sum '09 Proceedings of the 2009 Workshop on Language Generation and Summarisation
  • Year:
  • 2009

Quantified Score

Hi-index 0.01

Visualization

Abstract

In evaluation of automatic summaries, it is necessary to employ multiple topics and human-produced models in order for the assessment to be stable and reliable. However, providing multiple topics and models is costly and time-consuming. This paper examines the relation between the number of available models and topics and the correlations with human judgment obtained by automatic metrics ROUGE and BE, as well as the manual Pyramid method. Testing all these methods on the same data set, taken from the TAC 2008 Summarization track, allows us to compare and contrast the methods under different conditions.