DDR: an index method for large time-series datasets

Authors:
Jiyuan An;Yi-Ping Phoebe Chen;Hanxiong Chen
Affiliations:
School of Information Technology, Faculty of Science and Technology, Deakin University, Melbourne Campus, Burwood, Victoria, Melbourne, 3125, Australia;School of Information Technology, Faculty of Science and Technology, Deakin University, Melbourne Campus, Burwood, Victoria, Melbourne, 3125, Australia and Australia Research Council Centre in Bio ...;Institute of Information Sciences and Electronics, University of Tsukuba, 1-1-1, Tennodai, Tsukuba shi, Ibraki ken, Japan
Venue:
Information Systems
Year:
2005

Citing 14
Cited 4

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Optimal multi-step k-nearest neighbor search

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
A comparison of DFT and DWT based similarity search in time-series databases

Proceedings of the ninth international conference on Information and knowledge management
Locally adaptive dimensionality reduction for indexing large time series databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
The convex polyhedra technique: an index structure for high-dimensional space

ADC '02 Proceedings of the 13th Australasian database conference - Volume 5
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Time Sequence Indexing for Arbitrary Lp Norms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
C2VA: Trim High Dimensional Indexes

WAIM '02 Proceedings of the Third International Conference on Advances in Web-Age Information Management
Mining Deviants in a Time Series Database

VLDB '99 Proceedings of the 25th International Conference on Very Large Data Bases
The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces

ICDE '99 Proceedings of the 15th International Conference on Data Engineering

Efficient moving average transform-based subsequence matching algorithms in time-series databases

Information Sciences: an International Journal
Brief Communication: Finding rule groups to classify high dimensional gene expression datasets

Computational Biology and Chemistry
Exploring the ncRNA-ncRNA patterns based on bridging rules

Journal of Biomedical Informatics
Significant Cancer Prevention Factor Extraction: An Association Rule Discovery Approach

Journal of Medical Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The tree index structure is a traditional method for searching similar data in large datasets. It is based on the presupposition that most sub-trees are pruned in the searching process. As a result, the number of page accesses is reduced. However, time-series datasets generally have a very high dimensionality. Because of the so-called dimensionality curse, the pruning effectiveness is reduced in high dimensionality. Consequently, the tree index structure is not a suitable method for time-series datasets. In this paper, we propose a two-phase (filtering and refinement) method for searching time-series datasets. In the filtering step, a quantizing time-series is used to construct a compact file which is scanned for filtering out irrelevant. A small set of candidates is translated to the second step for refinement. In this step, we introduce an effective index compression method named grid-based datawise dimensionality reduction (DRR) which attempts to preserve the characteristics of the time-series. An experimental comparison with existing techniques demonstrates the utility of our approach.