Deflating the Dimensionality Curse Using Multiple Fractal Dimensions

Authors:
Affiliations:
Venue:
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Year:
2000

Citing 0
Cited 34

The IGrid index: reversing the dimensionality curse for similarity indexing in high dimensional space

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Modeling high-dimensional index structures using sampling

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Tri-plots: scalable tools for multidimensional data mining

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Affinity-based management of main memory database clusters

ACM Transactions on Internet Technology (TOIT)
Fast Indexing and Visualization of Metric Data Sets using Slim-Trees

IEEE Transactions on Knowledge and Data Engineering
Nearest Neighbors Can Be Found Efficiently If the Dimension Is Small Relative to the Input Size

ICDT '03 Proceedings of the 9th International Conference on Database Theory
Generalized substring selectivity estimation

Journal of Computer and System Sciences - Special issue on PODS 2000
Approximate searches: k-neighbors + precision

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
On improved projection techniques to support visual exploration of multidimensional data sets

Information Visualization - Special issue on coordinated and multiple views in exploratory visualization
An Efficient Cost Model for Optimization of Nearest Neighbor Search in Low and Medium Dimensional Spaces

IEEE Transactions on Knowledge and Data Engineering
On accessing data in high-dimensional spaces: a comparative study of three space partitioning strategies

Journal of Systems and Software - Special issue: Performance modeling and analysis of computer systems and networks
Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
A Services Oriented Framework for Next Generation Data Analysis Centers

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Trajectory Indexing Using Movement Constraints

Geoinformatica
Fast estimation of fractal dimension and correlation integral on stream data

Information Processing Letters
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search

ACM Transactions on Database Systems (TODS)
Dimension induced clustering

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Evaluating the intrinsic dimension of evolving data streams

Proceedings of the 2006 ACM symposium on Applied computing
A fast and effective method to find correlations among attributes in databases

Data Mining and Knowledge Discovery
The Omni-family of all-purpose access methods: a simple and effective way to make similarity search more efficient

The VLDB Journal — The International Journal on Very Large Data Bases
Approximating TSP on metrics with bounded global growth

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Modeling LSH for performance tuning

Proceedings of the 17th ACM conference on Information and knowledge management
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
Self-tuning management of update-intensive multidimensional data in clusters of workstations

The VLDB Journal — The International Journal on Very Large Data Bases
Measuring evolving data streams' behavior through their intrinsic dimension

New Generation Computing
Adapting indexing trees to data distribution in feature spaces

Computer Vision and Image Understanding
Fast estimation of fractal dimension and correlation integral on stream data

Information Processing Letters
High-dimensional indexing: transformational approaches to high-dimensional range and similarity searches

High-dimensional indexing: transformational approaches to high-dimensional range and similarity searches
Slicing the metric space to provide quick indexing of complex data in the main memory

Information Systems
Can shared-neighbor distances defeat the curse of dimensionality?

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Nearest neighbor search on vertically partitioned high-dimensional data

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Redundant bit vectors for quickly searching high-dimensional regions

Proceedings of the First international conference on Deterministic and Statistical Methods in Machine Learning
Stable bounded canonical sets and image matching

EMMCVPR'05 Proceedings of the 5th international conference on Energy Minimization Methods in Computer Vision and Pattern Recognition
Spatial distance join based feature selection

Engineering Applications of Artificial Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Nearest neighbor queries are important in many settings, including spatial databases (Find the k closest cities) and multimedia databases (Find the k most similar images). Previous analyses have concluded that nearest neighbor search is hopeless in high dimensions, due to the notorious "curse of dimensionality". However, their precise analysis over real data sets is still an open problem.The typical and often implicit assumption in previous studies is that the data is uniformly distributed, with independence between attributes. However, real data sets overwhelmingly disobey these assumptions; rather, they typically are skewed and exhibit intrinsic ("fractal") dimensionalities that are much lower than their embedding dimension, e.g., due to subtle dependencies between attributes.In this paper, we show how the Hausdorff and Correlation fractal dimensions of a data set can yield extremely accurate formulas that can predict I/O performance to within one standard deviation. The practical contributions of this work are our accurate formulas which can be used for query optimization in spatial and multimedia databases. The theoretical contribution is the 'deflation' of the dimensionality curse.Our theoretical and empirical results show that previous worst-case analyses of nearest neighbor search in high dimensions are over-pessimistic, to the point of being unrealistic. The performance depends critically on the intrinsic ("fractal") dimensionality as opposed to the embedding dimension that the uniformity assumption incorrectly implies.