A Subsequence Matching Algorithm that Supports Normalization Transform in Time-Series Databases

Authors:
Woong-Kee Loh;Sang-Wook Kim;Kyu-Young Whang
Affiliations:
Department of Computer Science & Advanced Information Technology Research Center (AITrc), Korea Advanced Institute of Science and Technology (KAIST), Korea;College of Information and Communications, Hanyang University, Korea;Department of Computer Science & Advanced Information Technology Research Center (AITrc), Korea Advanced Institute of Science and Technology (KAIST), Korea
Venue:
Data Mining and Knowledge Discovery
Year:
2004

Citing 19
Cited 14

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Similarity-based queries for time series data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Fast time-series searching with scaling and shifting

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Advanced Engineering Mathematics: Maple Computer Guide

Advanced Engineering Mathematics: Maple Computer Guide
Win32 Systems Programming

Win32 Systems Programming
Digital Image Processing

Digital Image Processing
General match: a subsequence matching method in time-series databases based on generalized windows

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Efficient Retrieval of Similar Time Sequences Under Time Warping

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Duality-Based Subsequence Matching in Time-Series Databases

Proceedings of the 17th International Conference on Data Engineering
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
On Similarity Queries for Time-Series Data: Constraint Specification and Implementation

CP '95 Proceedings of the First International Conference on Principles and Practice of Constraint Programming
Efficient Time Series Matching by Wavelets

ICDE '99 Proceedings of the 15th International Conference on Data Engineering

An efficient subsequence matching method based on index interpolation

IEA/AIE'2005 Proceedings of the 18th international conference on Innovations in Applied Artificial Intelligence
Efficient moving average transform-based subsequence matching algorithms in time-series databases

Information Sciences: an International Journal
Using multiple indexes for efficient subsequence matching in time-series databases

Information Sciences: an International Journal
Ranked subsequence matching in time-series databases

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Similar sequence matching supporting variable-length and variable-tolerance continuous queries on time-series data stream

Information Sciences: an International Journal
Fast Normalization-Transformed Subsequence Matching in Time-Series Databases

IEICE - Transactions on Information and Systems
A stock recommendation system exploiting rule discovery in stock databases

Information and Software Technology
Distortion-free predictive streaming time-series matching

Information Sciences: an International Journal
Exact indexing for massive time series databases under time warping distance

Data Mining and Knowledge Discovery
Scaling-invariant boundary image matching using time-series matching techniques

Data & Knowledge Engineering
Lag patterns in time series databases

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
Similar subsequence search in time series databases

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Using multiple indexes for efficient subsequence matching in time-series databases

DASFAA'06 Proceedings of the 11th international conference on Database Systems for Advanced Applications
A single index approach for time-series subsequence matching that supports moving average transform of arbitrary order

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, an algorithm is proposed for subsequence matching that supports normalization transform in time-series databases. Normalization transform enables finding sequences with similar fluctuation patterns even though they are not close to each other before the normalization transform. Simple application of existing subsequence matching algorithms to support normalization transform is not feasible since the algorithms do not have information for normalization transform of subsequences of arbitrary lengths. Application of the existing whole matching algorithm supporting normalization transform to the subsequence matching is feasible, but requires an index for every possible length of the query sequence causing serious overhead on both storage space and update time. The proposed algorithm generates indexes only for a small number of different lengths of query sequences. For subsequence matching it selects the most appropriate index among them. Better search performance can be obtained by using more indexes. In this paper, the approach is called index interpolation. It is formally proved that the proposed algorithm does not cause false dismissal. The search performance can be traded off with storage space by adjusting the number of indexes. For performance evaluation, a series of experiments is conducted using the indexes for only five different lengths out of lengths 256~512 of the query sequence. The results show that the proposed algorithm outperforms the sequential scan by up to 2.4 times on the average when the selectivity of the query is 10驴2 and up to 14.6 times when it is 10驴5. Since the proposed algorithm performs better with smaller selectivities, it is suitable for practical situations, where the queries with smaller selectivities are much more frequent.