Efficient processing of similarity search under time warping in sequence databases: an index-based approach

Authors:
Sang-Wook Kim;Sanghyun Park;Wesley W. Chu
Affiliations:
College of Information and Communications, Hanyang University, South Korea;Department of Computer Science and Engineering, Pohang University of Science and Technology (POSTECH), South Korea;Department of Computer Science, University of California at Los Angeles (UCLA)
Venue:
Information Systems - Databases: Creation, management and utilization
Year:
2004

Citing 26
Cited 11

Computational geometry: an introduction

Computational geometry: an introduction
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Fundamentals of speech recognition

Fundamentals of speech recognition
On packing R-trees

CIKM '93 Proceedings of the second international conference on Information and knowledge management
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
String searching algorithms

String searching algorithms
FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Similarity-based queries for time series data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Finding patterns in time series: a dynamic programming approach

Advances in knowledge discovery and data mining
Fast time-series searching with scaling and shifting

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Index interpolation: an approach to subsequence matching supporting normalization transform in time-series databases

Proceedings of the ninth international conference on Information and knowledge management
VIDEX: an integrated generic video indexing approach

MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
Discrete Time Processing of Speech Signals

Discrete Time Processing of Speech Signals
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Data Mining: An Overview from a Database Perspective

IEEE Transactions on Knowledge and Data Engineering
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
STR: A Simple and Efficient Algorithm for R-Tree Packing

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
High-Dimensional Similarity Joins

ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Efficient Retrieval of Similar Time Sequences Under Time Warping

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Finding Similar Time Series

PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
A Generic Approach to Bulk Loading Multidimensional Index Structures

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
On Similarity Queries for Time-Series Data: Constraint Specification and Implementation

CP '95 Proceedings of the First International Conference on Principles and Practice of Constraint Programming
Efficient Searches for Similar Subsequences of Different Lengths in Sequence Databases

ICDE '00 Proceedings of the 16th International Conference on Data Engineering

Optimization of subsequence matching under time warping in time-series databases

Proceedings of the 2005 ACM symposium on Applied computing
Efficient algorithm for calculating similarity between trajectories containing an increasing dimension

AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
Prefix-querying with anL1 distance metric for time-series subsequence matching under time warping

Journal of Information Science
Efficient moving average transform-based subsequence matching algorithms in time-series databases

Information Sciences: an International Journal
Using multiple indexes for efficient subsequence matching in time-series databases

Information Sciences: an International Journal
Fast Normalization-Transformed Subsequence Matching in Time-Series Databases

IEICE - Transactions on Information and Systems
Distortion-free predictive streaming time-series matching

Information Sciences: an International Journal
Using dynamic time warping of T0 contours in the evaluation of cycle-to-cycle Pitch Detection Algorithms

Pattern Recognition Letters
A clustering algorithm for multiple data streams based on spectral component similarity

Information Sciences: an International Journal
Palmprint authentication using time series

AVBPA'05 Proceedings of the 5th international conference on Audio- and Video-Based Biometric Person Authentication
Case based time series prediction using biased time warp distance for electrical evoked potential forecasting in visual prostheses

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper discusses the effective processing of similarity search that supports time warping in large sequence databases. Time warping enables sequences with similar patterns to be found even when they are of different lengths. Prior methods for processing similarity search that supports time warping failed to employ multi-dimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. They have to scan the entire database, thus suffering from serious performance degradation in large databases. Another method that hires the suffix tree, which does not assume any distance function, also shows poor performance due to the large tree size.In this paper, we propose a novel method for similarity search that supports time warping. Our primary goal is to enhance the search performance in large databases without permitting any false dismissal. To attain this goal, we have devised a new distance function, Dtw-lb which consistently underestimates the time warping distance and satisfies the triangular inequality. Dtw-lb uses a 4-tuple feature vector that is extracted from each sequence and is invariant to time warping. For the efficient processing of similarity search, we employ a multi-dimensional index that uses the 4-tuple feature vector as indexing attributes, and Dtw-lb as a distance function. We prove that our method does not incur false dismissal. To verify the superiority of our method, we have performed extensive experiments. The results reveal that our method achieves a significant improvement in speed up to 43 times faster with a data set containing real-world S&P 500 stock data sequences, and up to 720 times with data sets containing a very large volume of synthetic data sequences. The performance gain increases: (1) as the number of data sequences increases, (2) the average length of data sequences increases, and (3) as the tolerance in a query decreases. Considering the characteristics of real databases, these tendencies imply that our approach is suitable for practical applications.