Cross-document summarization by concept classification

  • Authors:
  • Hilda Hardy;Nobuyuki Shimizu;Tomek Strzalkowski;Liu Ting;Xinyang Zhang;G. Bowden Wise

  • Affiliations:
  • NLIP Laboratory, University at Albany, Albany, NY;NLIP Laboratory, University at Albany, Albany, NY;NLIP Laboratory, University at Albany, Albany, NY;NLIP Laboratory, University at Albany, Albany, NY;NLIP Laboratory, University at Albany, Albany, NY;GE Global Research Center, Niskayuna, NY

  • Venue:
  • SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we describe a Cross Document Summarizer XDoX designed specifically to summarize large document sets (50-500 documents and more). Such sets of documents are typically obtained from routing or filtering systems run against a continuous stream of data, such as a newswire. XDoX works by identifying the most salient themes within the set (at the granularity level that is regulated by the user) and composing an extraction summary, which reflects these main themes. In the current version, XDoX is not optimized to produce a summary based on a few unrelated documents; indeed, such summaries are best obtained simply by concatenating summaries of individual documents. We show examples of summaries obtained in our tests as well as from our participation in the first Document Understanding Conference (DUC).