Mining news streams using cross-stream sequential patterns

  • Authors:
  • Robert Gwadera;Fabio Crestani

  • Affiliations:
  • Universita della Svizzera Italiana, Lugano, Switzerland;Universita della Svizzera Italiana, Lugano, Switzerland

  • Venue:
  • RIAO '10 Adaptivity, Personalization and Fusion of Heterogeneous Information
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a new method for mining streams of news stories using cross-stream sequential patterns. We cluster stories reporting the same event across the streams within a given time window. For every discovered cluster of stories we create an itemset-sequence consisting of stream identifiers of the stories in the cluster, where the sequence is ordered according to the timestamps of the stories. For every such itemset-sequence we record exact timestamps and content similarities between the respective stories, thus building a collection of itemset-sequences that we use for two tasks: (I) to discover cross-stream dependencies in terms of frequent sequential publishing patterns and content similarity and (II) to rank the streams of news stories with respect to timeliness of reporting important events and content authority. We tested the applicability of the presented method on a collection of streams of news stories which was gathered from major world news agencies.