The SR-tree: an index structure for high-dimensional nearest neighbor queries
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A cost model for nearest neighbor search in high-dimensional data space
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The pyramid-technique: towards breaking the curse of dimensionality
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Density-based indexing for approximate nearest-neighbor queries
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The TV-tree: an index structure for high-dimensional data
The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Fast Nearest Neighbor Search in High-Dimensional Space
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
When Is ''Nearest Neighbor'' Meaningful?
ICDT '99 Proceedings of the 7th International Conference on Database Theory
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
What Is the Nearest Neighbor in High Dimensional Spaces?
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research
ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
A General Framework for Increasing the Robustness of PCA-Based Correlation Clustering Algorithms
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Approximate Clustering of Noisy Biomedical Data
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
ICARIS '08 Proceedings of the 7th international conference on Artificial Immune Systems
Easing the Dimensionality Curse by Stretching Metric Spaces
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Subspace sums for extracting non-random data from massive noise
Knowledge and Information Systems
On the effects of dimensionality on data analysis with neural networks
IWANN '03 Proceedings of the 7th International Work-Conference on Artificial and Natural Neural Networks: Part II: Artificial Neural Nets Problem Solving Methods
Is the Distance Compression Effect Overstated? Some Theory and Experimentation
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Efficient Clustering of Web-Derived Data Sets
MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
Subspace and projected clustering: experimental evaluation and analysis
Knowledge and Information Systems
Detecting New Kinds of Patient Safety Incidents
DS '09 Proceedings of the 12th International Conference on Discovery Science
Shape-Based Autotagging of 3D Models for Retrieval
SAMT '09 Proceedings of the 4th International Conference on Semantic and Digital Media Technologies: Semantic Multimedia
Subspace methods for retrieval of general 3D models
Computer Vision and Image Understanding
A network-based model for high-dimensional information filtering
Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval
On the efficient computation of robust regression estimators
Computational Statistics & Data Analysis
Automatic configuration of spectral dimensionality reduction methods
Pattern Recognition Letters
CP-index: using clustering and pivots for indexing non-metric spaces
Proceedings of the Third International Conference on SImilarity Search and APplications
Metric spaces in data mining: applications to clustering
SIGSPATIAL Special
Can shared-neighbor distances defeat the curse of dimensionality?
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Subspace similarity search: efficient k-NN queries in arbitrary subspaces
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
On the impact of the metrics choice in SOM learning: some empirical results from financial data
KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
Towards improving a similarity search approach
Proceedings of the 48th Annual Southeast Regional Conference
A unifying criterion for unsupervised clustering and feature selection
Pattern Recognition
On (not) indexing quadratic form distance by metric access methods
Proceedings of the 14th International Conference on Extending Database Technology
Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
The Journal of Machine Learning Research
On nonmetric similarity search problems in complex domains
ACM Computing Surveys (CSUR)
Fast moment estimation in data streams in optimal space
Proceedings of the forty-third annual ACM symposium on Theory of computing
Electrostatic field framework for supervised and semi-supervised learning from incomplete data
Natural Computing: an international journal
Enhancing grid-density based clustering for high dimensional data
Journal of Systems and Software
Information Sciences: an International Journal
The role of hubness in clustering high-dimensional data
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Assessing the efficiency of health care providers: a SOM perspective
WSOM'11 Proceedings of the 8th international conference on Advances in self-organizing maps
Hubness-based fuzzy measures for high-dimensional k-nearest neighbor classification
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Quality of similarity rankings in time series
SSTD'11 Proceedings of the 12th international conference on Advances in spatial and temporal databases
A modified apriori algorithm for analysing high-dimensional gene data
IDEAL'11 Proceedings of the 12th international conference on Intelligent data engineering and automated learning
Distance metrics for high dimensional nearest neighborhood recovery: Compression and normalization
Information Sciences: an International Journal
Non-parametric detection of meaningless distances in high dimensional data
Statistics and Computing
Trading precision for speed: localised similarity functions
CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval
Adapting k-means algorithm for discovering clusters in subspaces
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
On fast non-metric similarity search by metric access methods
EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Data mining from a patient safety database: the lessons learned
Data Mining and Knowledge Discovery
ESPClust: an effective skew prevention method for model-based document clustering
CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Fractional distance measures for content-based image retrieval
ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
MMPClust: a skew prevention algorithm for model-based document clustering
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
The curse of dimensionality in data mining and time series prediction
IWANN'05 Proceedings of the 8th international conference on Artificial Neural Networks: computational Intelligence and Bioinspired Systems
Interactions between document representation and feature selection in text categorization
DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Analogy-based reasoning in classifier construction
Transactions on Rough Sets IV
On finding the natural number of topics with latent dirichlet allocation: some observations
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Ranking invariance based on similarity measures in document retrieval
AMR'05 Proceedings of the Third international conference on Adaptive Multimedia Retrieval: user, context, and feedback
Measuring the complexity of a collection of documents
ECIR'06 Proceedings of the 28th European conference on Advances in Information Retrieval
Revisiting centrality-as-relevance: support sets and similarity as geometric proximity
Journal of Artificial Intelligence Research
Applying instance-based techniques to prediction of final outcome in acute stroke
Artificial Intelligence in Medicine
Hubness-Aware shared neighbor distances for high-dimensional k-nearest neighbor classification
HAIS'12 Proceedings of the 7th international conference on Hybrid Artificial Intelligent Systems - Volume Part II
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Objective function-based clustering
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Clustering high dimensional data
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
The small sample size problem of ICA: A comparative study and analysis
Pattern Recognition
Perceptual indiscernibility, rough sets, descriptively near sets, and image analysis
Transactions on Rough Sets XV
Center-Based Indexing in Vector and Metric Spaces
Fundamenta Informaticae
Volume visualization and visual queries for large high-dimensional datasets
VISSYM'04 Proceedings of the Sixth Joint Eurographics - IEEE TCVG conference on Visualization
A survey on unsupervised outlier detection in high-dimensional numerical data
Statistical Analysis and Data Mining
Parsimonious Mahalanobis kernel for the classification of high dimensional data
Pattern Recognition
The bitvector machine: a fast and robust machine learning algorithm for non-linear problems
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Learning a ground object manifold for interpreting high-resolution sensor image
AICI'12 Proceedings of the 4th international conference on Artificial Intelligence and Computational Intelligence
Hybrid negative selection approach for anomaly detection
CISIM'12 Proceedings of the 11th IFIP TC 8 international conference on Computer Information Systems and Industrial Management
Assisted descriptor selection based on visual comparative data analysis
EuroVis'11 Proceedings of the 13th Eurographics / IEEE - VGTC conference on Visualization
A survey on enhanced subspace clustering
Data Mining and Knowledge Discovery
On the equivalence of PLSI and projected clustering
ACM SIGMOD Record
International Journal of Data Warehousing and Mining
Multimedia information retrieval in a social context
PROMISE'12 Proceedings of the 2012 international conference on Information Retrieval Meets Information Visualization
Training data selection for cross-project defect prediction
Proceedings of the 9th International Conference on Predictive Models in Software Engineering
Local and global scaling reduce hubs in space
The Journal of Machine Learning Research
Classification and outlier detection based on topic based pattern synthesis
MLDM'13 Proceedings of the 9th international conference on Machine Learning and Data Mining in Pattern Recognition
Black box scheduling for resource intensive virtual machine workloads with interference models
Future Generation Computer Systems
Class imbalance and the curse of minority hubs
Knowledge-Based Systems
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Research issues in outlier detection for data streams
ACM SIGKDD Explorations Newsletter
Hi-index | 0.00 |
In recent years, the effect of the curse of high dimensionality has been studied in great detail on several problems such as clustering, nearest neighbor search, and indexing. In high dimensional space the data becomes sparse, and traditional indexing and algorithmic techniques fail from a efficiency and/or effectiveness perspective. Recent research results show that in high dimensional space, the concept of proximity, distance or nearest neighbor may not even be qualitatively meaningful. In this paper, we view the dimensionality curse from the point of view of the distance metrics which are used to measure the similarity between objects. We specifically examine the behavior of the commonly used Lk norm and show that the problem of meaningfulness in high dimensionality is sensitive to the value of k. For example, this means that the Manhattan distance metric (L1 norm) is consistently more preferable than the Euclidean distance metric (L2 norm) for high dimensional data mining applications. Using the intuition derived from our analysis, we introduce and examine a natural extension of the Lk norm to fractional distance metrics. We show that the fractional distance metric provides more meaningful results both from the theoretical and empirical perspective. The results show that fractional distance metrics can significantly improve the effectiveness of standard clustering algorithms such as the k-means algorithm.