Efficient indexing of interval time sequences

Authors:
Jong-Won Roh;Byoung-Kee Yi
Affiliations:
Department of Computer Science and Engineering, POSTECH, San 31, Hyoja-dong, Pohang, Republic of Korea;IHIS Research Center, Kyungpook National University, 101, Dongin-2ga, Daegu, Republic of Korea
Venue:
Information Processing Letters
Year:
2008

Citing 18
Cited 1

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Computing the minimum Hausdorff distance between two point sets on a line under translation

Information Processing Letters
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Similarity-based queries for time series data

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Information Retrieval

Information Retrieval
General match: a subsequence matching method in time-series databases based on generalized windows

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Locally adaptive dimensionality reduction for indexing large time series databases

ACM Transactions on Database Systems (TODS)
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'

IEEE Transactions on Knowledge and Data Engineering
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Efficient Retrieval of Similar Time Sequences Under Time Warping

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Duality-Based Subsequence Matching in Time-Series Databases

Proceedings of the 17th International Conference on Data Engineering
Fast Time Sequence Indexing for Arbitrary Lp Norms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Similarity Searching for Multi-Attribute Sequences

SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Similarity Search for Multidimensional Data Sequences

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Discovering Similar Multidimensional Trajectories

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Indexing multi-dimensional time-series with support for multiple distance measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Bounded similarity querying for time-series data

Information and Computation - Special issue: Commemorating the 50th birthday anniversary of Paris C. Kanellakis
New clustering methods for interval data

Computational Statistics

Efficient bitmap-based indexing of time-based interval sequences

Information Sciences: an International Journal

Quantified Score

Hi-index	0.89

Visualization

Abstract

Time sequences, which are ordered sets of observations, have been studied in various database applications. In this paper, we introduce a new class of time sequences where each observation is represented by an interval rather than a number. Such sequences may arise in many situations. For instance, we may not be able to determine the exact value at a time point due to uncertainty or aggregation. Such observation may be represented better by a range of possible values. Similarity search with interval time sequences as both query and data sequences poses a new challenge for research. We first address the issue of (dis)similarity measures for interval time sequences. We choose an L"1 norm-based measure because it effectively quantifies the degree of overlapping and remoteness between two intervals, and is invariant irrespective of the position of an interval when it is enclosed within another interval. We next propose an efficient indexing technique for fast retrieval of similar interval time sequences from large databases. More specifically, we propose: (1) to extract a segment-based feature vector for each sequence, and (2) to map each feature vector to either a point or a hyper-rectangle in a multi-dimensional feature space. We then show how we can use existing multi-dimensional index structures such as the R-tree for efficient query processing. The proposed method guarantees no false dismissals. Experimental results show that, for synthetic and real stock data, it is superior to sequential scanning in performance and scales well with the data size.