Fast Retrieval of Similar Subsequences in Long Sequence Databases

Authors:
Sanghyun Park;Dongwon Lee;Wesley W. Chu
Affiliations:
-;-;-
Venue:
KDEX '99 Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange
Year:
1999

Citing 0
Cited 17

Segment-based approach for subsequence searches in sequence databases

Proceedings of the 2001 ACM symposium on Applied computing
Locally adaptive dimensionality reduction for indexing large time series databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Locally adaptive dimensionality reduction for indexing large time series databases

ACM Transactions on Database Systems (TODS)
A Simple Dimensionality Reduction Technique for Fast Similarity Search in Large Time Series Databases

PADKK '00 Proceedings of the 4th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Current Issues and New Applications
On the need for time series data mining benchmarks: a survey and empirical demonstration

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration

Data Mining and Knowledge Discovery
Similarity search of time-warped subsequences via a suffix tree

Information Systems
Bounded similarity querying for time-series data

Information and Computation - Special issue: Commemorating the 50th birthday anniversary of Paris C. Kanellakis
Exact indexing of dynamic time warping

Knowledge and Information Systems
Exact indexing of dynamic time warping

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Searching on the secondary structure of protein sequences

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
A Piecewise Linear Representation Method of Time Series Based on Feature Points

KES '07 Knowledge-Based Intelligent Information and Engineering Systems and the XVII Italian Workshop on Neural Networks on Proceedings of the 11th International Conference
Bounded similarity querying for time-series data

Information and Computation
A review on time series data mining

Engineering Applications of Artificial Intelligence
TIDES--a new descriptor for time series oscillation behavior

Geoinformatica
Similarity measure based on piecewise linear approximation and derivative dynamic time warping for time series mining

Expert Systems with Applications: An International Journal
Time-series data mining

ACM Computing Surveys (CSUR)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Although the Euclidean distance has been the most popular similarity measure in sequence databases, recent techniques prefer to use high-cost distance functions such as the time warping distance and the editing distance for wider applicability. However, if these distance functions are applied to the retrieval of similar subsequences, the number of subsequences to be inspected during the search is quadratic to the average length L of data sequences. In this paper, we propose a novel subsequence matching scheme, called the aligned subsequence matching, where the number of subsequences to be compared with a query sequence is reduced to linear to L. We also present an indexing technique to speed-up the aligned subsequence matching using the similarity measure of the modified time warping distance. The experiments on the synthetic data sequences demonstrate the effectiveness of our proposed approach; ours consistently outperformed the sequential scanning and achieved up to 6.5 times speed-up.