Fast subsequence matching in time-series databases
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Efficient Similarity Search In Sequence Databases
FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Duality-Based Subsequence Matching in Time-Series Databases
Proceedings of the 17th International Conference on Data Engineering
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Efficient Time Series Matching by Wavelets
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
A Subsequence Matching Algorithm that Supports Normalization Transform in Time-Series Databases
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Subsequence matching is one of the most important issues in the field of data mining. The existing subsequence matching algorithms use windows of the fixed size to construct only one index. The algorithms have a problem that their performance gets worse as the difference between the query sequence length and the window size increases. In this paper, we propose a new subsequence matching method based on index interpolation, which is a technique that constructs the indexes for multiple window sizes and chooses an index most appropriate for a given query sequence for subsequence matching. We first examine the performance change due to the window size effect through preliminary experiments, and devise a cost function for subsequence matching that reflects the distribution of query sequence lengths in the view point of physical database design. Next, we propose a new subsequence matching method to improve search performance, and present an algorithm based on the cost function to construct the multiple indexes to maximize the performance. Finally, we verify the superiority of the proposed method through a series of experiments using the real and the synthetic data sequences.