The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Computing the minimum Hausdorff distance between two point sets on a line under translation
Information Processing Letters
Fast subsequence matching in time-series databases
SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Similarity-based queries for time series data
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Information Retrieval
General match: a subsequence matching method in time-series databases based on generalized windows
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Locally adaptive dimensionality reduction for indexing large time series databases
ACM Transactions on Database Systems (TODS)
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'
IEEE Transactions on Knowledge and Data Engineering
Efficient Similarity Search In Sequence Databases
FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Efficient Retrieval of Similar Time Sequences Under Time Warping
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
Duality-Based Subsequence Matching in Time-Series Databases
Proceedings of the 17th International Conference on Data Engineering
Fast Time Sequence Indexing for Arbitrary Lp Norms
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Similarity Searching for Multi-Attribute Sequences
SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
Similarity Search for Multidimensional Data Sequences
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Discovering Similar Multidimensional Trajectories
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Indexing multi-dimensional time-series with support for multiple distance measures
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Bounded similarity querying for time-series data
Information and Computation - Special issue: Commemorating the 50th birthday anniversary of Paris C. Kanellakis
New clustering methods for interval data
Computational Statistics
Efficient bitmap-based indexing of time-based interval sequences
Information Sciences: an International Journal
Hi-index | 0.89 |
Time sequences, which are ordered sets of observations, have been studied in various database applications. In this paper, we introduce a new class of time sequences where each observation is represented by an interval rather than a number. Such sequences may arise in many situations. For instance, we may not be able to determine the exact value at a time point due to uncertainty or aggregation. Such observation may be represented better by a range of possible values. Similarity search with interval time sequences as both query and data sequences poses a new challenge for research. We first address the issue of (dis)similarity measures for interval time sequences. We choose an L"1 norm-based measure because it effectively quantifies the degree of overlapping and remoteness between two intervals, and is invariant irrespective of the position of an interval when it is enclosed within another interval. We next propose an efficient indexing technique for fast retrieval of similar interval time sequences from large databases. More specifically, we propose: (1) to extract a segment-based feature vector for each sequence, and (2) to map each feature vector to either a point or a hyper-rectangle in a multi-dimensional feature space. We then show how we can use existing multi-dimensional index structures such as the R-tree for efficient query processing. The proposed method guarantees no false dismissals. Experimental results show that, for synthetic and real stock data, it is superior to sequential scanning in performance and scales well with the data size.