Using multiple indexes for efficient subsequence matching in time-series databases

Authors:
Seung-Hwan Lim;Heejin Park;Sang-Wook Kim
Affiliations:
College of Information and Communications, Hanyang University, Republic of Korea;College of Information and Communications, Hanyang University, Republic of Korea;College of Information and Communications, Hanyang University, Republic of Korea
Venue:
Information Sciences: an International Journal
Year:
2007

Citing 29
Cited 8

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Similarity-based queries for time series data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Finding patterns in time series: a dynamic programming approach

Advances in knowledge discovery and data mining
Fast time-series searching with scaling and shifting

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
General match: a subsequence matching method in time-series databases based on generalized windows

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Data Mining: An Overview from a Database Perspective

IEEE Transactions on Knowledge and Data Engineering
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Efficient Retrieval of Similar Time Sequences Under Time Warping

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Duality-Based Subsequence Matching in Time-Series Databases

Proceedings of the 17th International Conference on Data Engineering
Finding Similar Time Series

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Time Sequence Indexing for Arbitrary Lp Norms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
On Similarity Queries for Time-Series Data: Constraint Specification and Implementation

CP '95 Proceedings of the First International Conference on Principles and Practice of Constraint Programming
On Similarity-Based Queries for Time Series Data

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Efficient Time Series Matching by Wavelets

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Efficient Searches for Similar Subsequences of Different Lengths in Sequence Databases

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
A Subsequence Matching Algorithm that Supports Normalization Transform in Time-Series Databases

Data Mining and Knowledge Discovery
Efficient processing of similarity search under time warping in sequence databases: an index-based approach

Information Systems - Databases: Creation, management and utilization
Efficient processing of subsequence matching with the Euclidean metric in time-series databases

Information Processing Letters
Exact indexing of dynamic time warping

Knowledge and Information Systems
Subsequence matching on structured time series data

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
A hybrid genetic-neural architecture for stock indexes forecasting

Information Sciences: an International Journal - Special issue: Computational intelligence in economics and finance
Scaling and time warping in time series querying

VLDB '05 Proceedings of the 31st international conference on Very large data bases
A segment-wise time warping method for time scaling searching

Information Sciences—Informatics and Computer Science: An International Journal
Time-series forecasting using flexible neural tree model

Information Sciences: an International Journal
Shape-based retrieval in time-series databases

Journal of Systems and Software

Qualitative-probabilistic-network-based modeling of temporal causalities and its application to feedback loop identification

Information Sciences: an International Journal
A review on time series data mining

Engineering Applications of Artificial Intelligence
Boundary-based lower-bound functions for dynamic time warping and their indexing

Information Sciences: an International Journal
Searching and mining trillions of time series subsequences under dynamic time warping

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Data structures for detecting rare variations in time series

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part II
Accelerating subsequence similarity search based on dynamic time warping distance with FPGA

Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping

ACM Transactions on Knowledge Discovery from Data (TKDD) - Special Issue on ACM SIGKDD 2012
Discovering longest-lasting correlation in sequence databases

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.07

Visualization

Abstract

A time-series database is a set of data sequences, each of which is a list of changing values of an object in a given period of time. Subsequence matching is an operation that searches for such data subsequences whose changing patterns are similar to a query sequence in a time-series database. This paper addresses a performance issue of time-series subsequence matching. First, we quantitatively examine the performance degradation caused by the window size effect, and then show that the performance of subsequence matching with a single index is not satisfactory in real applications. We claim that index interpolation is a fairly effective tool to solve this problem. Index interpolation performs subsequence matching by selecting the most appropriate one from multiple indexes built on windows of their distinct sizes. For index interpolation, we need to decide the sizes of windows for multiple indexes to be built. In this paper, we solve the problem of selecting optimal window sizes from the perspective of physical database design. Given a set of pairs of query sequences to be performed in a target application and a set of window sizes for building multiple indexes, we devise a formula that estimates the overall cost of all the subsequence matchings performed in a target application. By using this formula, we propose an algorithm that determines the optimal window sizes for maximizing the performance of entire subsequence matchings. We formally prove the optimality as well as the effectiveness of the algorithm. Finally, we show the superiority of our approach by performing extensive experiments with a real-life stock data set and a large volume of synthetic data sets.