When Is ''Nearest Neighbor'' Meaningful?

Authors:
Kevin S. Beyer;Jonathan Goldstein;Raghu Ramakrishnan;Uri Shaft
Affiliations:
-;-;-;-
Venue:
ICDT '99 Proceedings of the 7th International Conference on Database Theory
Year:
1999

Citing 22
Cited 189

Color indexing

International Journal of Computer Vision
Approximate closest-point queries in high dimensions

Information Processing Letters
Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Efficient and effective querying by image content

Journal of Intelligent Information Systems - Special issue: advances in visual information management systems
Visual learning and recognition of 3-D objects from appearance

International Journal of Computer Vision
Accounting for boundary effects in nearest neighbor searching

Proceedings of the eleventh annual symposium on Computational geometry
Using Discriminant Eigenfeatures for Image Retrieval

IEEE Transactions on Pattern Analysis and Machine Intelligence
Texture Features for Browsing and Retrieval of Image Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
Nearest neighbor searching and applications

Nearest neighbor searching and applications
Fast parallel similarity search in multimedia databases

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Distance-based indexing for high-dimensional metric spaces

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A cost model for nearest neighbor search in high-dimensional data space

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A Simple Algorithm for Nearest Neighbor Search in High Dimensions

IEEE Transactions on Pattern Analysis and Machine Intelligence
An optimal algorithm for approximate nearest neighbor searching

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
Optimal Expected-Time Algorithms for Closest Point Problems

ACM Transactions on Mathematical Software (TOMS)
The TV-tree: an index structure for high-dimensional data

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Efficient Similarity Search In Sequence Databases

FODO '93 Proceedings of the 4th International Conference on Foundations of Data Organization and Algorithms
Feature-Based Retrieval of Similar Shapes

Proceedings of the Ninth International Conference on Data Engineering
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
Estimating the Selectivity of Spatial Queries Using the `Correlation' Fractal Dimension

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
Analysis of n-Dimensional Quadtrees using the Hausdorff Fractal Dimension

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases

An Efficient Incremental Lower Bound Approach for Solving Approximate Nearest-Neighbor Problem of Complex Vague Queries

FQAS '02 Proceedings of the 5th International Conference on Flexible Query Answering Systems
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Nearest Neighbors Can Be Found Efficiently If the Dimension Is Small Relative to the Input Size

ICDT '03 Proceedings of the 9th International Conference on Database Theory
C2VA: Trim High Dimensional Indexes

WAIM '02 Proceedings of the Third International Conference on Advances in Web-Age Information Management
Efficient Similarity Search in Feature Spaces with the Q-Tree

ADBIS '02 Proceedings of the 6th East European Conference on Advances in Databases and Information Systems
CoFD: An Algorithm for Non-distance Based Clustering in High Dimensional Spaces

DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
An Algorithm for Incremental Nearest Neighbor Search in High-Dimensional Data Spaces

Proceedings of the First International Conference on The Human Society and the Internet - Internet Related Socio-Economic Issues
Experiments in Parallel Clustering with DBSCAN

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Fast k-NN Image Search with Self-Organizing Maps

CIVR '02 Proceedings of the International Conference on Image and Video Retrieval
Frequent-Pattern based Iterative Projected Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
OP-Cluster: Clustering by Tendency in High Dimensional Space

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
MaPle: A Fast Algorithm for Maximal Pattern-based Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Group Nearest Neighbor Queries

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
LDC: Enabling Search By Partial Distance In A Hyper-Dimensional Space

ICDE '04 Proceedings of the 20th International Conference on Data Engineering
Decoupling partitioning and grouping: Overcoming shortcomings of spatial indexing with bucketing

ACM Transactions on Database Systems (TODS)
Fast Approximate Similarity Search in Extremely High-Dimensional Data Sets

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Example-Based Robust Outlier Detection in High Dimensional Datasets

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Generating High Dimensional Data and Query Sets

SOFSEM '07 Proceedings of the 33rd conference on Current Trends in Theory and Practice of Computer Science
A General Framework for Increasing the Robustness of PCA-Based Correlation Clustering Algorithms

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Boosting the Immune System

ICARIS '08 Proceedings of the 7th international conference on Artificial Immune Systems
Efficient Bounds in Finding Aggregate Nearest Neighbors

DEXA '08 Proceedings of the 19th international conference on Database and Expert Systems Applications
Estimating Sales Opportunity Using Similarity-Based Methods

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Pleiades: Subspace Clustering and Evaluation

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
Embedded Map Projection for Dimensionality Reduction-Based Similarity Search

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Pairwise Constrained Clustering for Sparse and High Dimensional Feature Spaces

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
A Quantitative Method for Comparing Multi-Agent-Based Simulations in Feature Space

Multi-Agent-Based Simulation IX
MLR-Index: An Index Structure for Fast and Scalable Similarity Search in High Dimensions

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Easing the Dimensionality Curse by Stretching Metric Spaces

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
HSM: Heterogeneous Subspace Mining in High Dimensional Data

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
On the effects of dimensionality on data analysis with neural networks

IWANN '03 Proceedings of the 7th International Work-Conference on Artificial and Natural Neural Networks: Part II: Artificial Neural Nets Problem Solving Methods
Spatial Skyline Queries: An Efficient Geometric Algorithm

SSTD '09 Proceedings of the 11th International Symposium on Advances in Spatial and Temporal Databases
Is the Distance Compression Effect Overstated? Some Theory and Experimentation

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Indexing the Function: An Efficient Algorithm for Multi-dimensional Search with Expensive Distance Functions

ADMA '09 Proceedings of the 5th International Conference on Advanced Data Mining and Applications
A Fast Feature-Based Method to Detect Unusual Patterns in Multidimensional Datasets

DaWaK '09 Proceedings of the 11th International Conference on Data Warehousing and Knowledge Discovery
Subspace and projected clustering: experimental evaluation and analysis

Knowledge and Information Systems
Practical protocol for Yao’s millionaires problem enables secure multi-party computation of metrics and efficient privacy-preserving k-NN for large data sets

Knowledge and Information Systems
Projected Gustafson Kessel Clustering

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Metric learning with feature decomposition for image categorization

Neurocomputing
Efficient and accurate nearest neighbor and closest pair search in high-dimensional space

ACM Transactions on Database Systems (TODS)
Automatic malware categorization using cluster ensemble

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
On the efficient computation of robust regression estimators

Computational Statistics & Data Analysis
Efficient nearest neighbor query based on extended B+-tree in high-dimensional space

Pattern Recognition Letters
Technical Section: An evaluation of descriptors for large-scale image retrieval from sketched feature lines

Computers and Graphics
Soft Nearest Convex Hull Classifier

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Indexability, concentration, and VC theory

Proceedings of the Third International Conference on SImilarity Search and APplications
Dimension reduction for distance-based indexing

Proceedings of the Third International Conference on SImilarity Search and APplications
Approximate and probabilistic methods

SIGSPATIAL Special
On the distance concentration awareness of certain data reduction techniques

Pattern Recognition
Can shared-neighbor distances defeat the curse of dimensionality?

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Subspace similarity search: efficient k-NN queries in arbitrary subspaces

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Projection based clustering of gene expression data

CIBB'09 Proceedings of the 6th international conference on Computational intelligence methods for bioinformatics and biostatistics
On the impact of the metrics choice in SOM learning: some empirical results from financial data

KES'10 Proceedings of the 14th international conference on Knowledge-based and intelligent information and engineering systems: Part III
X-SDR: an extensible experimentation suite for dimensionality reduction

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Finding the least influenced set in uncertain databases

Information Systems
Spatially constrained fuzzy-clustering-based sensor placement for spatiotemporal fuzzy-control system

IEEE Transactions on Fuzzy Systems
Clustering of Adolescent Criminal Offenders using Psychological and Criminological Profiles

Proceedings of the 2010 conference on Data Mining for Business Applications
Dimensionality reduction by using sparse reconstruction embedding

PCM'10 Proceedings of the Advances in multimedia information processing, and 11th Pacific Rim conference on Multimedia: Part II
Distributed antipole clustering for efficient data search and management in Euclidean and metric spaces

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Towards improving a similarity search approach

Proceedings of the 48th Annual Southeast Regional Conference
Enhancing Clustering Quality through Landmark-Based Dimensionality Reduction

ACM Transactions on Knowledge Discovery from Data (TKDD)
Randomized locality sensitive vocabularies for bag-of-features model

ECCV'10 Proceedings of the 11th European conference on computer vision conference on Computer vision: Part III
Fuzzy clustering of time series in the frequency domain

Information Sciences: an International Journal
Document clustering using synthetic cluster prototypes

Data & Knowledge Engineering
Subspace clustering for indexing high dimensional data: a main memory index based on local reductions and individual multi-representations

Proceedings of the 14th International Conference on Extending Database Technology
On (not) indexing quadratic form distance by metric access methods

Proceedings of the 14th International Conference on Extending Database Technology
Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data

The Journal of Machine Learning Research
Outlier detection by example

Journal of Intelligent Information Systems
Efficient k-nearest neighbor graph construction for generic similarity measures

Proceedings of the 20th international conference on World wide web
Scalable local density-based distributed clustering

Expert Systems with Applications: An International Journal
BE-tree: an index structure to efficiently match boolean expressions over high-dimensional discrete space

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
ATLAS: a probabilistic algorithm for high dimensional similarity search

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
A subspace decision cluster classifier for text classification

Expert Systems with Applications: An International Journal
Lower bounds on performance of metric tree indexing schemes for exact similarity search in high dimensions

Proceedings of the Fourth International Conference on SImilarity Search and APplications
Real-time trip information service for a large taxi fleet

MobiSys '11 Proceedings of the 9th international conference on Mobile systems, applications, and services
Niching foundations: basin identification on fixed-property generated landscapes

Proceedings of the 13th annual conference on Genetic and evolutionary computation
Memetic evolutionary multi-objective neural network classifier to predict graft survival in liver transplant patients

Proceedings of the 13th annual conference companion on Genetic and evolutionary computation
Electrostatic field framework for supervised and semi-supervised learning from incomplete data

Natural Computing: an international journal
A text document clustering method based on ontology

ISNN'11 Proceedings of the 8th international conference on Advances in neural networks - Volume Part II
Picasso - to sing, you must close your eyes and draw

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Combining instance selection methods based on data characterization: An approach to increase their effectiveness

Information Sciences: an International Journal
Projected Gustafson-Kessel clustering algorithm and its convergence

Transactions on rough sets XIV
Identifying hidden contexts in classification

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Predicting and tracking internet path changes

Proceedings of the ACM SIGCOMM 2011 conference
Clustering ensemble for spam filtering

HAIS'11 Proceedings of the 6th international conference on Hybrid artificial intelligent systems - Volume Part II
An extension of the PMML standard to subspace clustering models

Proceedings of the 2011 workshop on Predictive markup language modeling
Investigating a novel GA-based feature selection method using improved KNN classifiers

International Journal of Information and Communication Technology
Fast reciprocal nearest neighbors clustering

Signal Processing
Efficient selectivity estimation by histogram construction based on subspace clustering

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
DB-CSC: a density-based approach for subspace clustering in graphs with feature vectors

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Enhancing a Fuzzy Logic Inference Engine through Machine Learning for a Self- Managed Network

Mobile Networks and Applications
MSSQ: manhattan spatial skyline queries

SSTD'11 Proceedings of the 12th international conference on Advances in spatial and temporal databases
Quality of similarity rankings in time series

SSTD'11 Proceedings of the 12th international conference on Advances in spatial and temporal databases
Community assessment using evidence networks

MSM'10/MUSE'10 Proceedings of the 2010 international conference on Analysis of social media and ubiquitous data
Neuro-logistic Models Based on Evolutionary Generalized Radial Basis Function for the Microarray Gene Expression Classification Problem

Neural Processing Letters
Spatial skyline queries: exact and approximation algorithms

Geoinformatica
Texture analysis by a PLS based method for combined feature extraction and selection

MLMI'11 Proceedings of the Second international conference on Machine learning in medical imaging
Distance metrics for high dimensional nearest neighborhood recovery: Compression and normalization

Information Sciences: an International Journal
Scalable density-based subspace clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
Fast and flexible unsupervised custering algorithm based on ultrametric properties

Proceedings of the 7th ACM symposium on QoS and security for wireless and mobile networks
Importance Sampling for a Monte Carlo Matrix Multiplication Algorithm, with Application to Information Retrieval

SIAM Journal on Scientific Computing
Non-parametric detection of meaningless distances in high dimensional data

Statistics and Computing
Minimizing the search space for shape retrieval algorithms

ISCIS'06 Proceedings of the 21st international conference on Computer and Information Sciences
Classification of dermatological ulcers based on tissue composition and color texture features

Proceedings of the 4th International Symposium on Applied Sciences in Biomedical and Communication Technologies
Clustering large collection of biomedical literature based on ontology-enriched bipartite graph representation and mutual refinement strategy

PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Soft rank clustering

WIRN'05 Proceedings of the 16th Italian conference on Neural Nets
An empirical study on the effectiveness of hyperspectral image classification algorithms with dimensionality reduction

Proceedings of the 2011 ACM Symposium on Research in Applied Computation
A clustering algorithm based absorbing nearest neighbors

WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Trading precision for speed: localised similarity functions

CIVR'05 Proceedings of the 4th international conference on Image and Video Retrieval
A graphics hardware accelerated algorithm for nearest neighbor search

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
Finding data broadness via generalized nearest neighbors

EDBT'06 Proceedings of the 10th international conference on Advances in Database Technology
Social multimedia: highlighting opportunities for search and mining of multimedia data in social media applications

Multimedia Tools and Applications
Topological operators: a relaxed query processing approach

Geoinformatica
An efficient clustering and indexing approach over large video sequences

PCM'06 Proceedings of the 7th Pacific Rim conference on Advances in Multimedia Information Processing
Mining outliers with ensemble of heterogeneous detectors on random subspaces

DASFAA'10 Proceedings of the 15th international conference on Database Systems for Advanced Applications - Volume Part I
When is nearest neighbors indexable?

ICDT'05 Proceedings of the 10th international conference on Database Theory
ESPClust: an effective skew prevention method for model-based document clustering

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Yet another induction algorithm

KES'05 Proceedings of the 9th international conference on Knowledge-Based Intelligent Information and Engineering Systems - Volume Part II
Redundant bit vectors for quickly searching high-dimensional regions

Proceedings of the First international conference on Deterministic and Statistical Methods in Machine Learning
Indexing issues in supporting similarity searching

PCM'04 Proceedings of the 5th Pacific Rim Conference on Advances in Multimedia Information Processing - Volume Part II
Fractional distance measures for content-based image retrieval

ECIR'05 Proceedings of the 27th European conference on Advances in Information Retrieval Research
A new indexing method for high dimensional dataset

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
MMPClust: a skew prevention algorithm for model-based document clustering

DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
The curse of dimensionality in data mining and time series prediction

IWANN'05 Proceedings of the 8th international conference on Artificial Neural Networks: computational Intelligence and Bioinspired Systems
Understanding multimedia document semantics for cross-media retrieval

PCM'05 Proceedings of the 6th Pacific-Rim conference on Advances in Multimedia Information Processing - Volume Part I
Multidimensional descriptor indexing: exploring the bitmatrix

CIVR'06 Proceedings of the 5th international conference on Image and Video Retrieval
High-dimensional similarity search using data-sensitive space partitioning

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
An effective method for approximating the euclidean distance in high-dimensional space

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Analogy-based reasoning in classifier construction

Transactions on Rough Sets IV
Application of fuzzy time series models for forecasting pollution concentrations

Expert Systems with Applications: An International Journal
XML document clustering by independent component analysis

KDXD'06 Proceedings of the First international conference on Knowledge Discovery from XML Documents
A habit mining approach for discovering similar mobile users

Proceedings of the 21st international conference on World Wide Web
Pivot selection: Dimension reduction for distance-based indexing

Journal of Discrete Algorithms
A fast audio similarity retrieval method for millions of music tracks

Multimedia Tools and Applications
Quality of forecasting based on compressed high frequency time series

ICDEM'10 Proceedings of the Second international conference on Data Engineering and Management
An on-line learning method for face association in personal photo collection

Image and Vision Computing
Improving vehicle aeroacoustics using machine learning

Engineering Applications of Artificial Intelligence
Improving the ranking quality of medical image retrieval using a genetic feature selection method

Decision Support Systems
Subspace clustering

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Clustering high dimensional data

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Parsimonious temporal aggregation

The VLDB Journal — The International Journal on Very Large Data Bases
Dynamic optimization of queries in pivot-based indexing

Multimedia Tools and Applications
Subspace correlation clustering: finding locally correlated dimensions in subspace projections of the data

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Mining coherent subgraphs in multi-layer graphs with edge labels

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Substructure clustering: a novel mining paradigm for arbitrary data types

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Sensitivity of self-tuning histograms: query order affecting accuracy and robustness

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Perceptual indiscernibility, rough sets, descriptively near sets, and image analysis

Transactions on Rough Sets XV
Clustering of high dimensional data streams

SETN'12 Proceedings of the 7th Hellenic conference on Artificial Intelligence: theories and applications
Center-Based Indexing in Vector and Metric Spaces

Fundamenta Informaticae
Maximal Similarity Embedding

Neurocomputing
Local directional pattern based signature verification using weighted fractional distance classification

Proceedings of the South African Institute for Computer Scientists and Information Technologists Conference
Dimensionality reduction in high-dimensional space for multimedia information retrieval

DEXA'07 Proceedings of the 18th international conference on Database and Expert Systems Applications
Sparse embedding: a framework for sparsity promoting dimensionality reduction

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part VI
Density-Based projected clustering of data streams

SUM'12 Proceedings of the 6th international conference on Scalable Uncertainty Management
The bitvector machine: a fast and robust machine learning algorithm for non-linear problems

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Cross domain similarity mining: research issues and potential applications including supporting research by analogy

ACM SIGKDD Explorations Newsletter
Assisted descriptor selection based on visual comparative data analysis

EuroVis'11 Proceedings of the 13th Eurographics / IEEE - VGTC conference on Visualization
Feature selection for unsupervised learning

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part III
Visualization of multidimensional data in explorative forecast

ICCVG'12 Proceedings of the 2012 international conference on Computer Vision and Graphics
A survey on enhanced subspace clustering

Data Mining and Knowledge Discovery
On the equivalence of PLSI and projected clustering

ACM SIGMOD Record
Predicting missing values with biclustering: A coherence-based approach

Pattern Recognition
Projective clustering ensembles

Data Mining and Knowledge Discovery
Fuzzy C-Means in High Dimensional Spaces

International Journal of Fuzzy System Applications
Dimensionality Reduction with Unsupervised Feature Selection and Applying Non-Euclidean Norms for Classification Accuracy

International Journal of Data Warehousing and Mining
Clustering facilitated web services discovery model based on supervised term weighting and adaptive metric learning

International Journal of Web Engineering and Technology
Multimedia search and retrieval using multimodal annotation propagation and indexing techniques

Image Communication
Towards large scale continuous EDA: a random matrix theory perspective

Proceedings of the 15th annual conference on Genetic and evolutionary computation
Parameter-free and domain-independent similarity search with diversity

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
RMiCS: a robust approach for mining coherent subgraphs in edge-labeled multi-layer graphs

Proceedings of the 25th International Conference on Scientific and Statistical Database Management
Heat pump detection from coarse grained smart meter data with positive and unlabeled learning

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
Understanding Similarity Metrics in Neighbour-based Recommender Systems

Proceedings of the 2013 Conference on the Theory of Information Retrieval
Training data selection for cross-project defect prediction

Proceedings of the 9th International Conference on Predictive Models in Software Engineering
Robust estimation of location and scatter by pruning the minimum spanning tree

Journal of Multivariate Analysis
Flexible and adaptive subspace search for outlier analysis

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Topic modelling of clickthrough data in image search

Multimedia Tools and Applications
One class random forests

Pattern Recognition
Machine learning based typology development in archaeology

Journal on Computing and Cultural Heritage (JOCCH)
DIMO: distributed index for matching multimedia objects using MapReduce

Proceedings of the 5th ACM Multimedia Systems Conference
Linear feature selection in texture analysis - A PLS based method

Machine Vision and Applications
MSSQ: Manhattan Spatial Skyline Queries

Information Systems
An approach to dimensionality reduction in time series

Information Sciences: an International Journal
A Klein-Bottle-Based Dictionary for Texture Representation

International Journal of Computer Vision
A neurology-inspired model of web usage

Neurocomputing
Survey of Clustering: Algorithms and Applications

International Journal of Information Retrieval Research
Research issues in outlier detection for data streams

ACM SIGKDD Explorations Newsletter
Rectifying the representation learned by Non-negative Matrix Factorization

International Journal of Knowledge-based and Intelligent Engineering Systems
Tensor clustering via adaptive subspace iteration

Intelligent Data Analysis
Subspace clustering of high-dimensional data: an evolutionary approach

Applied Computational Intelligence and Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore the effect of dimensionality on the "nearest neighbor" problem. We show that under a broad set of conditions (much broader than independent and identically distributed dimensions), as dimensionality increases, the distance to the nearest data point approaches the distance to the farthest data point. To provide a practical perspective, we present empirical results on both real and synthetic data sets that demonstrate that this effect can occur for as few as 10-15 dimensions. These results should not be interpreted to mean that high-dimensional indexing is never meaningful; we illustrate this point by identifying some high-dimensional workloads for which this effect does not occur. However, our results do emphasize that the methodology used almost universally in the database literature to evaluate high-dimensional indexing techniques is flawed, and should be modified. In particular, most such techniques proposed in the literature are not evaluated versus simple linear scan, and are evaluated over workloads for which nearest neighbor is not meaningful. Often, even the reported experiments, when analyzed carefully, show that linear scan would outperform the techniques being proposed on the workloads studied in high (10-15) dimensionality!