Scalable similarity search of timeseries with variable dimensionality

Authors:
Omar U. Florez;Curtis Dyreson
Affiliations:
Utah State University, Logan, UT, USA;Utah State University, Logan, UT, USA
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 7
Cited 0

R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Exact indexing of dynamic time warping

Knowledge and Information Systems
Scaling and time warping in time series querying

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing

Knowledge and Information Systems
The TS-tree: efficient time series search and retrieval

EDBT '08 Proceedings of the 11th international conference on Extending database technology: Advances in database technology
iSAX: indexing and mining terabyte sized time series

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Sublinear querying of realistic timeseries and its application to human motion

Proceedings of the international conference on Multimedia information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Timeseries can be similar in shape but differ in length. For example, the sound waves produced by the same word spoken twice have roughly the same shape, but one may be shorter in duration. Stream data mining, approximate querying of image and video databases, data compression, and near duplicate detection are applications that need to be able to classify or cluster such timeseries, and to search for and rank timeseries that are similar to a chosen timeseries. We demonstrate software for clustering and performing similarity search in databases of timeseries data, where the timeseries have high and variable dimensionality. Our demonstration uses Timeseries Sensitive Hashing (TSH)[3] to index the timeseries. TSH adapts Locality Sensitive Hashing (LSH), which is an approximate algorithm to index data points in a d-dimensional space under some (e.g., Euclidean) distance function. TSH, unlike LSH, can index points that do not have the same dimensionality. As examples of the potential of TSH, the demonstration will index and classify timeseries from an image database and timeseries describing human motion extracted from a video stream and a motion capture system.