Efficient discovery of unusual patterns in time series

  • Authors:
  • Stefano Lonardi;Jessica Lin;Eamonn Keogh;Bill Chiu

  • Affiliations:
  • Department of Computer Science & Engineering, University of California, Riverside, CA;Department of Information and Software Engineering, George Mason University, Fairfax, VA;Department of Computer Science & Engineering, University of California, Riverside, CA;Department of Computer Science & Engineering, University of California, Riverside, CA

  • Venue:
  • New Generation Computing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

The problem of finding a specified pattern in a time series database (i.e., query by content) has received much attention and is now a relatively mature field. In contrast, the important problem of enumerating all surprising or interesting patterns has received far less attention. This problem requires a meaningful definition of "surprise", and an efficient search technique. All previous attempts at finding surprising patterns in time series use a very limited notion of surprise, and/or do not scale to massive datasets. To overcome these limitations we propose a novel technique that defines a pattern surprising if the frequency of its occurrence differs substantially from that expected by chance, given some previously seen data. This notion has the advantage of not requiring the user to explicitly define what is a surprising pattern, which may be hard, or perhaps impossible, to elicit from a domain expert. Instead, the user gives the algorithm a collection of previously observed "normal" data. Our algorithm uses a suffix tree to efficiently encode the frequency of all observed patterns and allows a Markov model to predict the expected frequency of previously unobserved patterns. Once the suffix tree has been constructed, a measure of surprise for all the patterns in a new database can be determined in time and space linear in the size of the database. We demonstrate the utility of our approach with an extensive experimental evaluation.