Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension
PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Density-based indexing for approximate nearest-neighbor queries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Outlier detection for high dimensional data
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
On the 'Dimensionality Curse' and the 'Self-Similarity Blessing'
IEEE Transactions on Knowledge and Data Engineering
When Is ''Nearest Neighbor'' Meaningful?
ICDT '99 Proceedings of the 7th International Conference on Database Theory
Proceedings of the 17th International Conference on Data Engineering
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces
ICDT '01 Proceedings of the 8th International Conference on Database Theory
What Is the Nearest Neighbor in High Dimensional Spaces?
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Independent Quantization: An Index Compression Technique for High-Dimensional Data Spaces
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Deflating the Dimensionality Curse Using Multiple Fractal Dimensions
ICDE '00 Proceedings of the 16th International Conference on Data Engineering
Navigating massive data sets via local clustering
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Subspace clustering for high dimensional data: a review
ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
The Amsterdam Library of Object Images
International Journal of Computer Vision
Iterative Projected Clustering by Subspace Mining
IEEE Transactions on Knowledge and Data Engineering
Example-Based Robust Outlier Detection in High Dimensional Datasets
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
The Concentration of Fractional Distances
IEEE Transactions on Knowledge and Data Engineering
Clustering Using a Similarity Measure Based on Shared Near Neighbors
IEEE Transactions on Computers
Angle-based outlier detection in high-dimensional data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
A General Framework for Increasing the Robustness of PCA-Based Correlation Clustering Algorithms
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Global Correlation Clustering Based on the Hough Transform
Statistical Analysis and Data Mining
The Relevant-Set Correlation Model for Data Clustering
Statistical Analysis and Data Mining
ACM Transactions on Knowledge Discovery from Data (TKDD)
Outlier Detection in Axis-Parallel Subspaces of High Dimensional Data
PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
On High Dimensional Indexing of Uncertain Data
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
OutRank: ranking outliers in high dimensional data
ICDEW '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering Workshop
Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Quality of similarity rankings in time series
SSTD'11 Proceedings of the 12th international conference on Advances in spatial and temporal databases
Hubness-Aware shared neighbor distances for high-dimensional k-nearest neighbor classification
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Discourse type clustering using POS n-gram profiles and high-dimensional embeddings
EACL '12 Proceedings of the Student Research Workshop at the 13th Conference of the European Chapter of the Association for Computational Linguistics
A survey on unsupervised outlier detection in high-dimensional numerical data
Statistical Analysis and Data Mining
When big data leads to lost data
Proceedings of the 5th Ph.D. workshop on Information and knowledge
A survey on enhanced subspace clustering
Data Mining and Knowledge Discovery
Machine learning based typology development in archaeology
Journal on Computing and Cultural Heritage (JOCCH)
Hi-index | 0.00 |
The performance of similarity measures for search, indexing, and data mining applications tends to degrade rapidly as the dimensionality of the data increases. The effects of the so-called 'curse of dimensionality' have been studied by researchers for data sets generated according to a single data distribution. In this paper, we study the effects of this phenomenon on different similarity measures for multiply-distributed data. In particular, we assess the performance of shared-neighbor similarity measures, which are secondary similarity measures based on the rankings of data objects induced by some primary distance measure. We find that rank-based similarity measures can result in more stable performance than their associated primary distance measures.