A novel bit level time series representation with implication of similarity search and clustering

  • Authors:
  • Chotirat Ratanamahatana;Eamonn Keogh;Anthony J. Bagnall;Stefano Lonardi

  • Affiliations:
  • Dept. of Computer Science & Engineering, Univ. of California, Riverside, CA;Dept. of Computer Science & Engineering, Univ. of California, Riverside, CA;School of Computing Sciences, University of East Anglia, Norwich, UK;Dept. of Computer Science & Engineering, Univ. of California, Riverside, CA

  • Venue:
  • PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Because time series are a ubiquitous and increasingly prevalent type of data, there has been much research effort devoted to time series data mining recently. As with all data mining problems, the key to effective and scalable algorithms is choosing the right representation of the data. Many high level representations of time series have been proposed for data mining. In this work, we introduce a new technique based on a bit level approximation of the data. The representation has several important advantages over existing techniques. One unique advantage is that it allows raw data to be directly compared to the reduced representation, while still guaranteeing lower bounds to Euclidean distance. This fact can be exploited to produce faster exact algorithms for similarly search. In addition, we demonstrate that our new representation allows time series clustering to scale to much larger datasets.