A cost model for query processing in high dimensional data spaces

Authors:
Christian Böhm
Affiliations:
Univ. of Munich, Munich, Germany
Venue:
ACM Transactions on Database Systems (TODS)
Year:
2000

Citing 52
Cited 42

Monte Carlo methods. Vol. 1: basics

Monte Carlo methods. Vol. 1: basics
Analysis of object oriented spatial access methods

SIGMOD '87 Proceedings of the 1987 ACM SIGMOD international conference on Management of data
Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
A retrieval technique for similar shapes

SIGMOD '91 Proceedings of the 1991 ACM SIGMOD international conference on Management of data
Numerical recipes in C (2nd ed.): the art of scientific computing

Numerical recipes in C (2nd ed.): the art of scientific computing
Towards an analysis of range query performance in spatial data structures

PODS '93 Proceedings of the twelfth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Similar shape retrieval using a structural feature index

Information Systems
Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Efficient and effective querying by image content

Journal of Intelligent Information Systems - Special issue: advances in visual information management systems
Fast subsequence matching in time-series databases

SIGMOD '94 Proceedings of the 1994 ACM SIGMOD international conference on Management of data
FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets
Accounting for boundary effects in nearest neighbor searching

Proceedings of the eleventh annual symposium on Computational geometry
Nearest neighbor queries

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
A model for the prediction of R-tree performance

PODS '96 Proceedings of the fifteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Nearest neighbor searching and applications

Nearest neighbor searching and applications
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Fast parallel similarity search in multimedia databases

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
S3: similarity search in CAD database systems

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A cost model for nearest neighbor search in high-dimensional data space

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Feature-index-based similar shape retrieval

Proceedings of the third IFIP WG2.6 working conference on Visual database systems 3 (VDB-3)
A cost model for similarity queries in metric spaces

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The pyramid-technique: towards breaking the curse of dimensionality

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Similarity query processing using disk arrays

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Multidimensional access methods

ACM Computing Surveys (CSUR)
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Analysis of an Algorithm for Finding Nearest Neighbors in Euclidean Space

ACM Transactions on Mathematical Software (TOMS)
Approximation-Based Similarity Search for 3-D Surface Segments

Geoinformatica
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The TV-tree: an index structure for high-dimensional data

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Feature-Based Retrieval of Similar Shapes

Proceedings of the Ninth International Conference on Data Engineering
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Performance of Nearest Neighbor Queries in R-Trees

ICDT '97 Proceedings of the 6th International Conference on Database Theory
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Active Storage for Large-Scale Data Mining and Multimedia

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Optimization for Spatial Query Processing

VLDB '91 Proceedings of the 17th International Conference on Very Large Data Bases
Fast Similarity Search in the Presence of Noise, Scaling, and Translation in Time-Series Databases

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Analysis of n-Dimensional Quadtrees using the Hausdorff Fractal Dimension

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Fast Nearest Neighbor Search in Medical Image Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Efficient User-Adaptable Similarity Search in Large Multimedia Databases

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
Optimal Redundancy in Spatial Database Systems

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
Ranking in Spatial Databases

SSD '95 Proceedings of the 4th International Symposium on Advances in Spatial Databases
3D Similarity Search by Shape Approximation

SSD '97 Proceedings of the 5th International Symposium on Advances in Spatial Databases
Using extended feature objects for partial similarity retrieval

The VLDB Journal — The International Journal on Very Large Data Bases
The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces

ICDE '99 Proceedings of the 15th International Conference on Data Engineering
Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces

ICDE '00 Proceedings of the 16th International Conference on Data Engineering

Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases

ACM Computing Surveys (CSUR)
Spatial queries in dynamic environments

ACM Transactions on Database Systems (TODS)
Location-based spatial queries

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
ClusterTree: Integration of Cluster Representation and Nearest-Neighbor Search for Large Data Sets with High Dimensions

IEEE Transactions on Knowledge and Data Engineering
An Efficient Technique for Nearest-Neighbor Query Processing on the SPY-TEC

IEEE Transactions on Knowledge and Data Engineering
The power-method: a comprehensive estimation technique for multi-dimensional queries

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
Analysis of predictive spatio-temporal queries

ACM Transactions on Database Systems (TODS)
Group Nearest Neighbor Queries

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Approximate Temporal Aggregation

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
An Efficient Cost Model for Optimization of Nearest Neighbor Search in Low and Medium Dimensional Spaces

IEEE Transactions on Knowledge and Data Engineering
Aggregate nearest neighbor queries in spatial databases

ACM Transactions on Database Systems (TODS)
A Threshold-Based Algorithm for Continuous Monitoring of k Nearest Neighbors

IEEE Transactions on Knowledge and Data Engineering
Generalized multidimensional data mapping and query processing

ACM Transactions on Database Systems (TODS)
ERkNN: efficient reverse k-nearest neighbors retrieval with local kNN-distance estimation

Proceedings of the 14th ACM international conference on Information and knowledge management
Challenges in spatiotemporal stream query optimization

MobiDE '06 Proceedings of the 5th ACM international workshop on Data engineering for wireless and mobile access
Cost models for distance joins queries using R-trees

Data & Knowledge Engineering
Branch-and-bound processing of ranked queries

Information Systems
A fast and effective method to find correlations among attributes in databases

Data Mining and Knowledge Discovery
Efficient Skyline and Top-k Retrieval in Subspaces

IEEE Transactions on Knowledge and Data Engineering
Gorder: an efficient method for KNN join processing

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Optimal subspace dimensionality for k-nearest-neighbor queries on clustered and dimensionality reduced datasets with SVD

Multimedia Tools and Applications
Query optimization for spatio-temporal data stream management systems

SIGSPATIAL Special
Measuring evolving data streams' behavior through their intrinsic dimension

New Generation Computing
Quality and efficiency in high dimensional nearest neighbor search

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Identifying the Most Endangered Objects from Spatial Datasets

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Analyzing Metric Space Indexes: What For?

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Quantization techniques for similarity search in high-dimensional data spaces

BNCOD'03 Proceedings of the 20th British national conference on Databases
Efficient and accurate nearest neighbor and closest pair search in high-dimensional space

ACM Transactions on Database Systems (TODS)
Searching trajectories by locations: an efficiency study

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Analysis and evaluation of V*-kNN: an efficient algorithm for moving kNN queries

The VLDB Journal — The International Journal on Very Large Data Bases
Enabling search services on outsourced private spatial data

The VLDB Journal — The International Journal on Very Large Data Bases
Optimized algorithms for predictive range and KNN queries on moving objects

Information Systems
On the asymptotic behavior of nearest neighbor search using pivot-based indexes

Proceedings of the Third International Conference on SImilarity Search and APplications
Optimizing similarity-based image joins in a multimedia database

Proceedings of the international workshop on Very-large-scale multimedia corpus, mining and retrieval
Flexible aggregate similarity search

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Nearest neighbor search on vertically partitioned high-dimensional data

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
On trip planning queries in spatial databases

SSTD'05 Proceedings of the 9th international conference on Advances in Spatial and Temporal Databases
SIMP: accurate and efficient near neighbor search in high dimensional spaces

Proceedings of the 15th International Conference on Extending Database Technology
Probabilistic cost model for nearest neighbor search in image retrieval

Computer Vision and Image Understanding
Towards large scale cross-media retrieval via modeling heterogeneous information and exploring an efficient indexing scheme

CVM'12 Proceedings of the First international conference on Computational Visual Media
Indexing RFID data using the VG-curve

ADC '12 Proceedings of the Twenty-Third Australasian Database Conference - Volume 124
Analysis of large scale climate data: how well climate change models and data from real sensor networks agree?

Proceedings of the 22nd international conference on World Wide Web companion

Quantified Score

Hi-index	0.00

Visualization

Abstract

During the last decade, multimedia databases have become increasingly important in many application areas such as medicine, CAD, geography, and molecular biology. An important research topic in multimedia databases is similarity search in large data sets. Most current approaches that address similarity search use the feature approach, which transforms important properties of the stored objects into points of a high-dimensional space (feature vectors). Thus, similarity search is transformed into a neighborhood search in feature space. Multidimensional index structures are usually applied when managing feature vectors. Query processing can be improved substantially with optimization techniques such as blocksize optimization, data space quantization, and dimension reduction. To determine optimal parameters, an accurate estimate of index-based query processing performance is crucial. In this paper we develop a cost model for index structures for point databases such as the R*-tree and the X-tree. It provides accurate estimates of the number of data page accesses for range queries and nearest-neighbor queries under a Euclidean metric and a maximum metric and a maximum metric. The problems specific to high-dimensional data spaces, called boundary effects, are considered. The concept of the fractal dimension is used to take the effects of correlated data into account.