Corpus and evaluation measures for multiple document summarization with multiple sources

  • Authors:
  • Tsutomu Hirao;Takahiro Fukusima;Manabu Okumura;Chikashi Nobata;Hidetsugu Nanba

  • Affiliations:
  • NTT Communication Science Laboratories;Otemon Gakuin University;Tokyo Institute of Technology;Communication Research Laboratories;Hiroshima City University

  • Venue:
  • COLING '04 Proceedings of the 20th international conference on Computational Linguistics
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we introduce a large-scale test collection for multiple document summarization, the Text Summarization Challenge 3 (TSC3) corpus. We detail the corpus construction and evaluation measures. The significant feature of the corpus is that it annotates not only the important sentences in a document set, but also those among them that have the same content. Moreover, we define new evaluation metrics taking redundancy into account and discuss the effectiveness of redundancy minimization.