An efficient subsequence matching method based on index interpolation

Authors:
Hyun-Gil Koh;Woong-Kee Loh;Sang-Wook Kim
Affiliations:
Department of Information and Communication Engineering, Kangwon National University, Korea;Department of Computer Science, Korea Advanced Institute of Science and Technology, Korea;College of Information and Communications, Hanyang University, Korea
Venue:
IEA/AIE'2005 Proceedings of the 18th international conference on Innovations in Applied Artificial Intelligence
Year:
2005

Citing 6
Cited 0

Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Duality-Based Subsequence Matching in Time-Series Databases

Proceedings of the 17th International Conference on Data Engineering
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Efficient Time Series Matching by Wavelets

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
A Subsequence Matching Algorithm that Supports Normalization Transform in Time-Series Databases

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Subsequence matching is one of the most important issues in the field of data mining. The existing subsequence matching algorithms use windows of the fixed size to construct only one index. The algorithms have a problem that their performance gets worse as the difference between the query sequence length and the window size increases. In this paper, we propose a new subsequence matching method based on index interpolation, which is a technique that constructs the indexes for multiple window sizes and chooses an index most appropriate for a given query sequence for subsequence matching. We first examine the performance change due to the window size effect through preliminary experiments, and devise a cost function for subsequence matching that reflects the distribution of query sequence lengths in the view point of physical database design. Next, we propose a new subsequence matching method to improve search performance, and present an algorithm based on the cost function to construct the multiple indexes to maximize the performance. Finally, we verify the superiority of the proposed method through a series of experiments using the real and the synthetic data sequences.