Ensemble-index: a new approach to indexing large databases

Authors:
Eamonn Keogh;Selina Chu;Michael Pazzani
Affiliations:
University of California, Irvine, California;University of California, Irvine, California;University of California, Irvine, California
Venue:
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2001

Citing 20
Cited 3

The effect multiple query representations on information retrieval system performance

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
Efficient retrieval for browsing large image databases

CIKM '96 Proceedings of the fifth international conference on Information and knowledge management
Efficiently supporting ad hoc queries in large datasets of time sequences

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
On the analysis of indexing schemes

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Dimensionality reduction for similarity searching in dynamic databases

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast time-series searching with scaling and shifting

PODS '99 Proceedings of the eighteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Dimensionality reduction and similarity computation by inner product approximations

Proceedings of the ninth international conference on Information and knowledge management
Index interpolation: an approach to subsequence matching supporting normalization transform in time-series databases

Proceedings of the ninth international conference on Information and knowledge management
A comparison of DFT and DWT based similarity search in time-series databases

Proceedings of the ninth international conference on Information and knowledge management
Locally adaptive dimensionality reduction for indexing large time series databases

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Variable Length Queries for Time Series Data

Proceedings of the 17th International Conference on Data Engineering
Fast Time Sequence Indexing for Arbitrary Lp Norms

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Interactive Learning with a "Society of Models"

CVPR '96 Proceedings of the 1996 Conference on Computer Vision and Pattern Recognition (CVPR '96)
Supporting Content-Based Searches on Time Series via Approximation

SSDBM '00 Proceedings of the 12th International Conference on Scientific and Statistical Database Management
On Similarity-Based Queries for Time Series Data

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Efficient Time Series Matching by Wavelets

ICDE '99 Proceedings of the 15th International Conference on Data Engineering

An Efficient Subspace Sampling Framework for High-Dimensional Data Reduction, Selectivity Estimation, and Nearest-Neighbor Search

IEEE Transactions on Knowledge and Data Engineering
A review on time series data mining

Engineering Applications of Artificial Intelligence
Effective monitoring by efficient fingerprint matching using a forest of NAQ-trees

Journal of Intelligent Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of similarity search (query-by-content) has attracted much research interest. It is a difficult problem because of the inherently high dimensionality of the data. The most promising solutions involve performing dimensionality reduction on the data, then indexing the reduced data with a multidimensional index structure. Many dimensionality reduction techniques have been proposed, including Singular Value Decomposition (SVD), the Discrete Fourier Transform (DFT), the Discrete Wavelet Transform (DWT) and Piecewise Polynomial Approximation. In this work, we introduce a novel framework for using ensembles of two or more representations for more efficient indexing. The basic idea is that instead of committing to a single representation for an entire dataset, different representations are chosen for indexing different parts of the database. The representations are chosen based upon a local view of the database. For example, sections of the data that can achieve a high fidelity representation with wavelets are indexed as wavelets, but highly spectral sections of the data are indexed using the Fourier transform. At query time, it is necessary to search several small heterogeneous indices, rather than one large homogeneous index. As we will theoretically and empirically demonstrate this results in much faster query response times.