Quantizing time series for efficient similarity search under time warping

Authors:
Inés F. Vega-López;Bongki Moon
Affiliations:
School of Informatics, Autonomous University of Sinaloa, Culiacán, Sinaloa, México;Department of Computer Science, University of Arizona, Tucson, AZ
Venue:
ACST'06 Proceedings of the 2nd IASTED international conference on Advances in computer science and technology
Year:
2006

Citing 19
Cited 1

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
FinTime: a financial time series benchmark

ACM SIGMOD Record
Scaling up dynamic time warping for datamining applications

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
A comparison of DFT and DWT based similarity search in time-series databases

Proceedings of the ninth international conference on Information and knowledge management
Locally adaptive dimensionality reduction for indexing large time series databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Searching Multimedia Databases by Content

Searching Multimedia Databases by Content
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Efficient Retrieval of Similar Time Sequences Under Time Warping

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Time Sequence Indexing for Arbitrary Lp Norms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
On the need for time series data mining benchmarks: a survey and empirical demonstration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Haar Wavelets for Efficient Similarity Search of Time-Series: With and Without Time Warping

IEEE Transactions on Knowledge and Data Engineering
Efficient Time Series Matching by Wavelets

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Efficient Searches for Similar Subsequences of Different Lengths in Sequence Databases

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Warping indexes with envelope transforms for query by humming

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Exact indexing of dynamic time warping

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

Faster retrieval with a two-pass dynamic-time-warping lower bound

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Indexing Time Series Data is an interesting problem that has attracted much interest in the research community for the last decade. Traditional indexing methods organize the data space using different metrics. For time series, however, there are some cases when a metric is not suited for properly assessing the similarity between sequences. For instance, to detect similarities between sequences that are locally out of phase Dynamic Time Warping (DTW) must be used. DTW is not a metric as it does not satisfy the triangular inequality. Therefore, traditional spatial access methods cannot be used without introducing false dismissals. In such cases, alternative methods for organizing and searching time series data must be proposed. In this paper we propose the use of quantization to generate small and homogeneous representations of time series. We compute upper- and lower-bounds on the DTW distance to a query sequence using this quantized representation to filter-out sequences that cannot be a best match for the query. In the proposed approach, efficient search is achieved by organizing the quantized representation of data in a linear array that can be efficiently read from disk. The computational cost of processing the query is shadowed by the IO cost required to scan the file containing the linear array and it does affect the total query cost.