Radial basis functions for multivariable interpolation: a review
Algorithms for approximation
Introduction to statistical pattern recognition (2nd ed.)
Introduction to statistical pattern recognition (2nd ed.)
Fast parallel similarity search in multimedia databases
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A cost model for nearest neighbor search in high-dimensional data space
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Density-based indexing for approximate nearest-neighbor queries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
ACM Computing Surveys (CSUR)
The TV-tree: an index structure for high-dimensional data
The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'
IEEE Transactions on Knowledge and Data Engineering
VQ-index: an index structure for similarity searching in multimedia databases
Proceedings of the tenth ACM international conference on Multimedia
Proceedings of the 17th International Conference on Data Engineering
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
What Is the Nearest Neighbor in High Dimensional Spaces?
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Near Neighbor Search in Large Metric Spaces
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Robust Similarity Measures for Mobile Object Trajectories
DEXA '02 Proceedings of the 13th International Workshop on Database and Expert Systems Applications
The Hybrid Tree: An Index Structure for High Dimensional Feature Spaces
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Peer-to-Peer Spatial Queries in Sensor Networks
P2P '03 Proceedings of the 3rd International Conference on Peer-to-Peer Computing
Analysis of predictive spatio-temporal queries
ACM Transactions on Database Systems (TODS)
Value and Relation Display for Interactive Exploration of High Dimensional Datasets
INFOVIS '04 Proceedings of the IEEE Symposium on Information Visualization
Compressing large boolean matrices using reordering techniques
VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Fractional distance measures for content-based image retrieval
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
Measuring the difficulty of distance-based indexing
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Functional classification in Hilbert spaces
IEEE Transactions on Information Theory
Relevance feedback: a power tool for interactive content-based image retrieval
IEEE Transactions on Circuits and Systems for Video Technology
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Nearest neighbors in high-dimensional data: the emergence and influence of hubs
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
When is 'nearest neighbour' meaningful: A converse theorem and implications
Journal of Complexity
ICB '09 Proceedings of the Third International Conference on Advances in Biometrics
New instability results for high-dimensional nearest neighbor search
Information Processing Letters
Is the Distance Compression Effect Overstated? Some Theory and Experimentation
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Space-time tradeoffs for approximate nearest neighbor searching
Journal of the ACM (JACM)
How does high dimensionality affect collaborative filtering?
Proceedings of the third ACM conference on Recommender systems
Simbed: Similarity-Based Embedding
ICANN '09 Proceedings of the 19th International Conference on Artificial Neural Networks: Part II
Dequantizing compressed sensing with non-Gaussian constraints
ICIP'09 Proceedings of the 16th IEEE international conference on Image processing
On the importance of data balancing for symbolic regression
IEEE Transactions on Evolutionary Computation
On the existence of obstinate results in vector space models
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
On the distance concentration awareness of certain data reduction techniques
Pattern Recognition
Can shared-neighbor distances defeat the curse of dimensionality?
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Subspace similarity search: efficient k-NN queries in arbitrary subspaces
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
On the impact of the metrics choice in SOM learning: some empirical results from financial data
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
The Journal of Machine Learning Research
The role of hubness in clustering high-dimensional data
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Assessing the efficiency of health care providers: a SOM perspective
WSOM'11 Proceedings of the 8th international conference on Advances in self-organizing maps
Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Distance metrics for high dimensional nearest neighborhood recovery: Compression and normalization
Information Sciences: an International Journal
A probabilistic approach to nearest-neighbor classification: naive hubness bayesian kNN
Proceedings of the 20th ACM international conference on Information and knowledge management
Non-parametric detection of meaningless distances in high dimensional data
Statistics and Computing
Hubness-Aware shared neighbor distances for high-dimensional k-nearest neighbor classification
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
A survey on unsupervised outlier detection in high-dimensional numerical data
Statistical Analysis and Data Mining
Parsimonious Mahalanobis kernel for the classification of high dimensional data
Pattern Recognition
A survey on enhanced subspace clustering
Data Mining and Knowledge Discovery
Case-Centred multidimensional scaling for classification visualisation in medical diagnosis
HIS'13 Proceedings of the second international conference on Health Information Science
Local and global scaling reduce hubs in space
The Journal of Machine Learning Research
Semi-supervised object recognition based on Connected Image Transformations
Expert Systems with Applications: An International Journal
Class imbalance and the curse of minority hubs
Knowledge-Based Systems
Hi-index | 0.02 |
Nearest neighbor search and many other numerical data analysis tools most often rely on the use of the euclidean distance. When data are high dimensional, however, the euclidean distances seem to concentrate; all distances between pairs of data elements seem to be very similar. Therefore, the relevance of the euclidean distance has been questioned in the past, and fractional norms (Minkowski-like norms with an exponent less than one) were introduced to fight the concentration phenomenon. This paper justifies the use of alternative distances to fight concentration by showing that the concentration is indeed an intrinsic property of the distances and not an artifact from a finite sample. Furthermore, an estimation of the concentration as a function of the exponent of the distance and of the distribution of the data is given. It leads to the conclusion that, contrary to what is generally admitted, fractional norms are not always less concentrated than the euclidean norm; a counterexample is given to prove this claim. Theoretical arguments are presented, which show that the concentration phenomenon can appear for real data that do not match the hypotheses of the theorems, in particular, the assumption of independent and identically distributed variables. Finally, some insights about how to choose an optimal metric are given.