Summarizing a document stream

  • Authors:
  • Hiroya Takamura;Hikaru Yokono;Manabu Okumura

  • Affiliations:
  • Precision and Intelligence Laboratory, Tokyo Institute of Technology;Precision and Intelligence Laboratory, Tokyo Institute of Technology;Precision and Intelligence Laboratory, Tokyo Institute of Technology

  • Venue:
  • ECIR'11 Proceedings of the 33rd European conference on Advances in information retrieval
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce the task of summarizing a stream of short documents on microblogs such as Twitter. On microblogs, thousands of short documents on a certain topic such as sports matches or TV dramas are posted by users. Noticeable characteristics of microblog data are that documents are often very highly redundant and aligned on timeline. There can be thousands of documents on one event in the topic. Two very similar documents will refer to two distinct events when the documents are temporally distant. We examine the microblog data to gain more understanding of those characteristics, and propose a summarization model for a stream of short documents on timeline, along with an approximate fast algorithm for generating summary.We empirically show that our model generates a good summary on the datasets of microblog documents on sports matches.