Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Modeling high-dimensional index structures using sampling
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Tri-plots: scalable tools for multidimensional data mining
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Affinity-based management of main memory database clusters
ACM Transactions on Internet Technology (TOIT)
Fast Indexing and Visualization of Metric Data Sets using Slim-Trees
IEEE Transactions on Knowledge and Data Engineering
Nearest Neighbors Can Be Found Efficiently If the Dimension Is Small Relative to the Input Size
ICDT '03 Proceedings of the 9th International Conference on Database Theory
Generalized substring selectivity estimation
Journal of Computer and System Sciences - Special issue on PODS 2000
Approximate searches: k-neighbors + precision
CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
On improved projection techniques to support visual exploration of multidimensional data sets
Information Visualization - Special issue on coordinated and multiple views in exploratory visualization
IEEE Transactions on Knowledge and Data Engineering
Journal of Systems and Software - Special issue: Performance modeling and analysis of computer systems and networks
Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
A Services Oriented Framework for Next Generation Data Analysis Centers
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 10 - Volume 11
Trajectory Indexing Using Movement Constraints
Geoinformatica
Fast estimation of fractal dimension and correlation integral on stream data
Information Processing Letters
iDistance: An adaptive B+-tree based indexing method for nearest neighbor search
ACM Transactions on Database Systems (TODS)
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Evaluating the intrinsic dimension of evolving data streams
Proceedings of the 2006 ACM symposium on Applied computing
A fast and effective method to find correlations among attributes in databases
Data Mining and Knowledge Discovery
The VLDB Journal — The International Journal on Very Large Data Bases
Approximating TSP on metrics with bounded global growth
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Modeling LSH for performance tuning
Proceedings of the 17th ACM conference on Information and knowledge management
ACM Transactions on Knowledge Discovery from Data (TKDD)
Self-tuning management of update-intensive multidimensional data in clusters of workstations
The VLDB Journal — The International Journal on Very Large Data Bases
Measuring evolving data streams' behavior through their intrinsic dimension
New Generation Computing
Adapting indexing trees to data distribution in feature spaces
Computer Vision and Image Understanding
Fast estimation of fractal dimension and correlation integral on stream data
Information Processing Letters
High-dimensional indexing: transformational approaches to high-dimensional range and similarity searches
Can shared-neighbor distances defeat the curse of dimensionality?
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Nearest neighbor search on vertically partitioned high-dimensional data
DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Redundant bit vectors for quickly searching high-dimensional regions
Proceedings of the First international conference on Deterministic and Statistical Methods in Machine Learning
Stable bounded canonical sets and image matching
EMMCVPR'05 Proceedings of the 5th international conference on Energy Minimization Methods in Computer Vision and Pattern Recognition
Spatial distance join based feature selection
Engineering Applications of Artificial Intelligence
Hi-index | 0.00 |
Nearest neighbor queries are important in many settings, including spatial databases (Find the k closest cities) and multimedia databases (Find the k most similar images). Previous analyses have concluded that nearest neighbor search is hopeless in high dimensions, due to the notorious "curse of dimensionality". However, their precise analysis over real data sets is still an open problem.The typical and often implicit assumption in previous studies is that the data is uniformly distributed, with independence between attributes. However, real data sets overwhelmingly disobey these assumptions; rather, they typically are skewed and exhibit intrinsic ("fractal") dimensionalities that are much lower than their embedding dimension, e.g., due to subtle dependencies between attributes.In this paper, we show how the Hausdorff and Correlation fractal dimensions of a data set can yield extremely accurate formulas that can predict I/O performance to within one standard deviation. The practical contributions of this work are our accurate formulas which can be used for query optimization in spatial and multimedia databases. The theoretical contribution is the 'deflation' of the dimensionality curse.Our theoretical and empirical results show that previous worst-case analyses of nearest neighbor search in high dimensions are over-pessimistic, to the point of being unrealistic. The performance depends critically on the intrinsic ("fractal") dimensionality as opposed to the embedding dimension that the uniformity assumption incorrectly implies.