Computational geometry: an introduction
Computational geometry: an introduction
The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Fundamentals of speech recognition
Fundamentals of speech recognition
CIKM '93 Proceedings of the second international conference on Information and knowledge management
Fast subsequence matching in time-series databases
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
String searching algorithms
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Similarity-based queries for time series data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Finding patterns in time series: a dynamic programming approach
Advances in knowledge discovery and data mining
Fast time-series searching with scaling and shifting
PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Proceedings of the ninth international conference on Information and knowledge management
VIDEX: an integrated generic video indexing approach
MULTIMEDIA '00 Proceedings of the eighth ACM international conference on Multimedia
Discrete Time Processing of Speech Signals
Discrete Time Processing of Speech Signals
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Data Mining: An Overview from a Database Perspective
IEEE Transactions on Knowledge and Data Engineering
Efficient Similarity Search In Sequence Databases
FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
STR: A Simple and Efficient Algorithm for R-Tree Packing
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
High-Dimensional Similarity Joins
ICDE '97 Proceedings of the Thirteenth International Conference on Data Engineering
Efficient Retrieval of Similar Time Sequences Under Time Warping
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
PKDD '97 Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery
A Generic Approach to Bulk Loading Multidimensional Index Structures
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects
VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
On Similarity Queries for Time-Series Data: Constraint Specification and Implementation
CP '95 Proceedings of the First International Conference on Principles and Practice of Constraint Programming
Efficient Searches for Similar Subsequences of Different Lengths in Sequence Databases
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Optimization of subsequence matching under time warping in time-series databases
Proceedings of the 2005 ACM symposium on Applied computing
AIA'06 Proceedings of the 24th IASTED international conference on Artificial intelligence and applications
Prefix-querying with anL1 distance metric for time-series subsequence matching under time warping
Journal of Information Science
Efficient moving average transform-based subsequence matching algorithms in time-series databases
Information Sciences: an International Journal
Using multiple indexes for efficient subsequence matching in time-series databases
Information Sciences: an International Journal
Fast Normalization-Transformed Subsequence Matching in Time-Series Databases
IEICE - Transactions on Information and Systems
Distortion-free predictive streaming time-series matching
Information Sciences: an International Journal
Pattern Recognition Letters
A clustering algorithm for multiple data streams based on spectral component similarity
Information Sciences: an International Journal
Palmprint authentication using time series
AVBPA'05 Proceedings of the 5th international conference on Audio- and Video-Based Biometric Person Authentication
Hi-index | 0.00 |
This paper discusses the effective processing of similarity search that supports time warping in large sequence databases. Time warping enables sequences with similar patterns to be found even when they are of different lengths. Prior methods for processing similarity search that supports time warping failed to employ multi-dimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. They have to scan the entire database, thus suffering from serious performance degradation in large databases. Another method that hires the suffix tree, which does not assume any distance function, also shows poor performance due to the large tree size.In this paper, we propose a novel method for similarity search that supports time warping. Our primary goal is to enhance the search performance in large databases without permitting any false dismissal. To attain this goal, we have devised a new distance function, Dtw-lb which consistently underestimates the time warping distance and satisfies the triangular inequality. Dtw-lb uses a 4-tuple feature vector that is extracted from each sequence and is invariant to time warping. For the efficient processing of similarity search, we employ a multi-dimensional index that uses the 4-tuple feature vector as indexing attributes, and Dtw-lb as a distance function. We prove that our method does not incur false dismissal. To verify the superiority of our method, we have performed extensive experiments. The results reveal that our method achieves a significant improvement in speed up to 43 times faster with a data set containing real-world S&P 500 stock data sequences, and up to 720 times with data sets containing a very large volume of synthetic data sequences. The performance gain increases: (1) as the number of data sequences increases, (2) the average length of data sequences increases, and (3) as the tolerance in a query decreases. Considering the characteristics of real databases, these tendencies imply that our approach is suitable for practical applications.