Efficient Subsequence Matching Using the Longest Common Subsequence with a Dual Match Index

Authors:
Tae Sik Han;Seung-Kyu Ko;Jaewoo Kang
Affiliations:
Dept. of Computer Science, North Carolina State University, Raleigh, NC 27569, USA;Dept. of Computer Science, North Carolina State University, Raleigh, NC 27569, USA;Dept. of Computer Science and Engineering, Korea University, Seoul 136-705, Korea
Venue:
MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2007

Citing 9
Cited 3

Dynamic programming algorithm optimization for spoken word recognition

Readings in speech recognition
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Efficient time-series subsequence matching using duality in constructing windows

Information Systems
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Duality-Based Subsequence Matching in Time-Series Databases

Proceedings of the 17th International Conference on Data Engineering
Warping indexes with envelope transforms for query by humming

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Discovering Similar Multidimensional Trajectories

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Indexing multi-dimensional time-series with support for multiple distance measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Exact indexing of dynamic time warping

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases

WHAM: a high-throughput sequence alignment method

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A survey of query-by-humming similarity methods

Proceedings of the 5th International Conference on PErvasive Technologies Related to Assistive Environments
Fast Longest Common Subsequence with General Integer Scoring Support on GPUs

Proceedings of Programming Models and Applications on Multicores and Manycores

Quantified Score

Hi-index	0.00

Visualization

Abstract

The purpose of subsequence matching is to find a query sequence from a long data sequence. Due to the abundance of applications, many solutions have been proposed. Virtually all previous solutions use the Euclidean measure as the basis for measuring distance between sequences. Recent studies, however, suggest that the Euclidean distance often fails to produce proper results due to the irregularity in the data, which is not so uncommon in our problem domain. Addressing this problem, some non-Euclidean measures, such as Dynamic Time Warping (DTW)and Longest Common Subsequence (LCS), have been proposed. However, most of the previous work in this direction focused on the whole sequence matching problem where query and data sequences are the same length. In this paper, we propose a novel subsequence matching framework using a non-Euclidean measure, in particular, LCS, and a new index query scheme. The proposed framework is based on the Dual Match framework where data sequences are divided into a series of disjoint equi-length subsequences and then indexed in an R-tree. We introduced similarity bound for index matching with LCS. The proposed query matching scheme reduces significant numbers of false positives in the match result. Furthermore, we developed an algorithm to skip expensive LCScomputations through observing the warping paths. We validated our framework through extensive experiments using 48 different time series datasets. The results of the experiments suggest that our approach significantly improves the subsequence matching performance in various metrics.