Similarity search of time-warped subsequences via a suffix tree

Authors:
Sanghyun Park;Wesley W. Chu;Jeehee Yoon;Jungim Won
Affiliations:
Department of Computer Science and Engineering, Pohang University of Science and Technology (POSTECH), Pohang, South Korea;Department of Computer Science, University of California at Los Angeles (UCLA);Division of Information and Communication Engineering, Hallym University, South Korea;Division of Information and Communication Engineering, Hallym University, South Korea
Venue:
Information Systems
Year:
2003

Citing 23
Cited 12

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Fundamentals of speech recognition

Fundamentals of speech recognition
Efficient processing of spatial joins using R-trees

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Combinatorial pattern discovery for scientific data: some preliminary results

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
String searching algorithms

String searching algorithms
FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Similarity-based queries for time series data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Finding patterns in time series: a dynamic programming approach

Advances in knowledge discovery and data mining
Matching and indexing sequences of different lengths

CIKM '97 Proceedings of the sixth international conference on Information and knowledge management
Scaling up dynamic time warping for datamining applications

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Prefix-querying: an approach for effective subsequence matching under time warping in sequence databases

Proceedings of the tenth international conference on Information and knowledge management
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Approximate Queries and Representations for Large Data Sequences

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Efficient Retrieval of Similar Time Sequences Under Time Warping

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Scaling up Dynamic Time Warping to Massive Dataset

PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Querying Shapes of Histories

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Sparse Suffix Trees

COCOON '96 Proceedings of the Second Annual International Conference on Computing and Combinatorics
On Similarity Queries for Time-Series Data: Constraint Specification and Implementation

CP '95 Proceedings of the First International Conference on Principles and Practice of Constraint Programming
Fast Retrieval of Similar Subsequences in Long Sequence Databases

KDEX '99 Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange
A Mathematical Theory of Communication

A Mathematical Theory of Communication

Efficient moving average transform-based subsequence matching algorithms in time-series databases

Information Sciences: an International Journal
OASIS: an online and accurate technique for local-alignment searches on biological sequences

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Approximate embedding-based subsequence matching of time series

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Fast Normalization-Transformed Subsequence Matching in Time-Series Databases

IEICE - Transactions on Information and Systems
Towards faster activity search using embedding-based subsequence matching

Proceedings of the 2nd International Conference on PErvasive Technologies Related to Assistive Environments
Benchmarking dynamic time warping for music retrieval

Proceedings of the 3rd International Conference on PErvasive Technologies Related to Assistive Environments
Embedding-based subsequence matching in time-series databases

ACM Transactions on Database Systems (TODS)
Similar subsequence search in time series databases

DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
A simple approximation for dynamic time warping search in large time series database

IDEAL'06 Proceedings of the 7th international conference on Intelligent Data Engineering and Automated Learning
Information retrieval of sequential data in heterogeneous XML databases

AMR'05 Proceedings of the Third international conference on Adaptive Multimedia Retrieval: user, context, and feedback
Mining single pass weighted pattern tree

ICDEM'10 Proceedings of the Second international conference on Data Engineering and Management
Similarity search over incomplete symbolic sequences

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes an indexing technique for fast retrieval of similar subsequences using the time-warping distance. The time-warping distance is a more suitable similarity measure than the Euclidean distance in many applications where sequences may be of different lengths and/or different sampling rates. The proposed indexing technique employs a disk-based suffix tree as an index structure and uses lower-bound distance functions to filter out dissimilar subsequences without false dismissals. To make the index structure compact and hence accelerate the query processing, it converts sequences in the continuous domain into sequences in the discrete domain and stores only a subset of the suffixes whose first values are different from those of the immediately preceding suffixes. Extensive experiments with real and synthetic data sequences revealed that the proposed approach significantly outperforms the sequential scan and LB scan approaches and scales well in a large volume of sequence databases.