Mining and ranking streams of news stories using cross-stream sequential patterns

  • Authors:
  • Robert Gwadera;Fabio Crestani

  • Affiliations:
  • Universita della Svizzera Italiana, Lugano, Switzerland;Universita della Svizzera Italiana, Lugano, Switzerland

  • Venue:
  • Proceedings of the 18th ACM conference on Information and knowledge management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a new method for mining and ranking streams of news stories using cross-stream sequential patterns and content similarity. In particular, we focus on stories reporting the same event across the streams within a given time window, where an event is defined as a specific thing that happens at a specific time and place. For every discovered cluster of stories reporting the same event we create an itemset-sequence consisting of stream identifiers of the stories in the cluster, where the sequence is ordered according to the timestamps of the stories. Furthermore, we record exact timestamps and content similarities between the respective stories. Given such a collection of itemset-sequences we use it for two tasks: (I) to discover recurrent temporal publishing patterns between the news streams in terms of frequent sequential patterns and content similarity and (II) to rank the streams of news stories with respect to timeliness of reporting important events and content authority. We demonstrate the applicability of the presented method on a multi-stream of news stories was gathered from RSS feeds of major world news agencies.