Optimizing Similarity Search for Arbitrary Length Time Series Queries

Authors:
Tamer Kahveci;Ambuj K. Singh
Affiliations:
-;-
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2004

Citing 26
Cited 12

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Wavelets and subband coding

Wavelets and subband coding
An analysis of schedules for performing multi-page requests

Information Systems
Similarity-based queries for time series data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Optimal multi-step k-nearest neighbor search

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Dimensionality reduction for similarity searching in dynamic databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast time-series searching with scaling and shifting

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Using wavelet decomposition to support progressive and approximate range-sum queries over data cubes

Proceedings of the ninth international conference on Information and knowledge management
Locally adaptive dimensionality reduction for indexing large time series databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Variable Length Queries for Time Series Data

Proceedings of the 17th International Conference on Data Engineering
Efficient Index Structures for String Databases

Proceedings of the 27th International Conference on Very Large Data Bases
Reading a Set of Disk Pages

VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Fast Nearest Neighbor Search in Medical Image Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Similarity Searching for Multi-Attribute Sequences

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Supporting Content-Based Searches on Time Series via Approximation

SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
TSA-Tree: A Wavelet-Based Approach to Improve the Efficiency of Multi-Level Surprise and Trend Queries on Time-Series Data

SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
Efficient Time Series Matching by Wavelets

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Similarity Search for Multidimensional Data Sequences

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Efficient Searches for Similar Subsequences of Different Lengths in Sequence Databases

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Landmarks: A New Model for Similarity-Based Pattern Querying in Time Series Databases

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Similarity Search Over Time-Series Data Using Wavelets

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Discovering Similar Multidimensional Trajectories

ICDE '02 Proceedings of the 18th International Conference on Data Engineering

Recovery of missing information in graph sequences by means of reference pattern matching and decision tree learning

Pattern Recognition
Fast correlation analysis on time series datasets

Proceedings of the 17th ACM conference on Information and knowledge management
A review on time series data mining

Engineering Applications of Artificial Intelligence
Using multiple indexes for efficient subsequence matching in time-series databases

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
Comparison of two different prediction schemes for the analysis of time series of graphs

IbPRIA'05 Proceedings of the Second Iberian conference on Pattern Recognition and Image Analysis - Volume Part II
Analysis of time series of graphs: prediction of node presence by means of decision tree learning

MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
Recovery of missing information in graph sequences

GbRPR'05 Proceedings of the 5th IAPR international conference on Graph-Based Representations in Pattern Recognition
Searching and mining trillions of time series subsequences under dynamic time warping

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on ACM SIGKDD 2012
Data mining a trillion time series subsequences under dynamic time warping

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Discovering longest-lasting correlation in sequence databases

Proceedings of the VLDB Endowment
Unsupervised categorization of human motion sequences

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Abstract--We consider the problem of finding similar patterns in a time sequence. Typical applications of this problem involve large databases consisting of long time sequences of different lengths. Current time sequence search techniques work well for queries of a prespecified length, but not for arbitrary length queries. We propose a novel indexing technique that works well for arbitrary length queries. The proposed technique stores index structures at different resolutions for a given data set. We prove that this index structure is superior to existing index structures that use a single resolution. We propose a range query and nearest neighbor query technique on this index structure and prove the optimality of our index structure for these search techniques. The experimental results show that our method is 4 to 20 times faster than the current techniques, including Sequential Scan, for range queries and 3 times faster than Sequential Scan and other techniques for nearest neighbor queries. Because of the need to store information at multiple resolution levels, the storage requirement of our method could potentially be large. In the second part of the paper, we show how the index information can be compressed with minimal information loss. According to our experimental results, even after compressing the size of the index to one fifth, the total cost of our method is 3 to 15 times less than the current techniques.