Summarizing Evolving Data Streams using Dynamic Prefix Trees

  • Authors:
  • Carlos Rojas;Olfa Nasraoui

  • Affiliations:
  • -;-

  • Venue:
  • WI '07 Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

In stream data mining it is important to use the most recent data to cope with the evolving nature of the underlying patterns. Simply keeping the most recent records offers no flexibility about which data is kept, and does not exploit even minimal redundancies in the data (a first step towards pattern discovery). This paper focuses in how to construct and maintain efficiently (in one pass) a compact summary for data such as web logs and text streams. The resulting structure is a prefix tree, with ordering criterion that changes with time, such as an activity time stamp or attribute frequency. A detailed analysis of the factors that affect its performance is carried out, including empirical evaluations using the well known 20 Newsgroups data set. Guidelines for forgetting and tree pruning are also provided. Finally, we use this data structure to discover evolving topics from the 20 Newsgroups.