Finding relevant patterns in bursty sequences

  • Authors:
  • Alexander Lachmann;Mirek Riedewald

  • Affiliations:
  • RWTH, Aachen, Germany;Cornell University, Ithaca, New York

  • Venue:
  • Proceedings of the VLDB Endowment
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Sequence data is ubiquitous and finding frequent sequences in a large database is one of the most common problems when analyzing sequence data. Unfortunately many sources of sequence data, e.g., sensor networks for data-driven science, RFID-based supply chain monitoring, and computing system monitoring infrastructure, produce a challenging workload for sequence mining. It is common to find bursts of events of the same type. Such bursts result in high mining cost, because input sequences are longer. An even greater challenge is that these bursts tend to produce an overwhelming number of irrelevant repetitive sequence patterns with high support. Simply raising the support threshold is not a solution, because at some point interesting sequences will get eliminated. As an alternative we propose a novel transformation of the input sequences. We show that this transformation has several desirable properties. First, the transformed data can still be mined with existing sequence mining algorithms. Second, for a given support threshold the mining result can often be obtained much faster and it is usually much smaller and easier to interpret. Third, and most importantly, we show that the result sequences retain the important characteristics of the sequences that would have been found in the original (not transformed) data. We validate our technique with an experimental study using synthetic and real data.