Efficient indexing of interval time sequences

  • Authors:
  • Jong-Won Roh;Byoung-Kee Yi

  • Affiliations:
  • Department of Computer Science and Engineering, POSTECH, San 31, Hyoja-dong, Pohang, Republic of Korea;IHIS Research Center, Kyungpook National University, 101, Dongin-2ga, Daegu, Republic of Korea

  • Venue:
  • Information Processing Letters
  • Year:
  • 2008

Quantified Score

Hi-index 0.89

Visualization

Abstract

Time sequences, which are ordered sets of observations, have been studied in various database applications. In this paper, we introduce a new class of time sequences where each observation is represented by an interval rather than a number. Such sequences may arise in many situations. For instance, we may not be able to determine the exact value at a time point due to uncertainty or aggregation. Such observation may be represented better by a range of possible values. Similarity search with interval time sequences as both query and data sequences poses a new challenge for research. We first address the issue of (dis)similarity measures for interval time sequences. We choose an L"1 norm-based measure because it effectively quantifies the degree of overlapping and remoteness between two intervals, and is invariant irrespective of the position of an interval when it is enclosed within another interval. We next propose an efficient indexing technique for fast retrieval of similar interval time sequences from large databases. More specifically, we propose: (1) to extract a segment-based feature vector for each sequence, and (2) to map each feature vector to either a point or a hyper-rectangle in a multi-dimensional feature space. We then show how we can use existing multi-dimensional index structures such as the R-tree for efficient query processing. The proposed method guarantees no false dismissals. Experimental results show that, for synthetic and real stock data, it is superior to sequential scanning in performance and scales well with the data size.