Re-designing distance functions and distance-based applications for high dimensional data

Authors:
Charu C. Aggarwal
Affiliations:
IBM T. J. Watson Research Center, Yorktown Heights, NY
Venue:
ACM SIGMOD Record
Year:
2001

Citing 0
Cited 27

Outlier detection for high dimensional data

SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
A human-computer cooperative system for effective high dimensional clustering

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
ICEAGE: Interactive Clustering and Exploration of Large and High-Dimensional Geodata

Geoinformatica
Towards systematic design of distance functions for data mining applications

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Generative model-based clustering of directional data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
An effective and efficient algorithm for high-dimensional outlier detection

The VLDB Journal — The International Journal on Very Large Data Bases
Query-sensitive embeddings

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Feature bagging for outlier detection

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
On the use of Human-Computer Interaction for Projected Nearest Neighbor Search

Data Mining and Knowledge Discovery
Query-sensitive embeddings

ACM Transactions on Database Systems (TODS)
The Concentration of Fractional Distances

IEEE Transactions on Knowledge and Data Engineering
Outlier detection in sensor networks

Proceedings of the 8th ACM international symposium on Mobile ad hoc networking and computing
Topic Extraction with AGAPE

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Incremental clustering of dynamic data streams using connectivity based representative points

Data & Knowledge Engineering
A flexible framework to ease nearest neighbor search in multidimensional data spaces

Data & Knowledge Engineering
Data mining of vector–item patterns using neighborhood histograms

Knowledge and Information Systems
Boosting support vector machines using multiple dissimilarities

KES'07/WIRN'07 Proceedings of the 11th international conference, KES 2007 and XVII Italian workshop on neural networks conference on Knowledge-based intelligent information and engineering systems: Part I
A partially supervised metric multidimensional scaling algorithm for textual data visualization

IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
On the combination of dissimilarities for gene expression data analysis

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Can shared-neighbor distances defeat the curse of dimensionality?

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Subspace similarity search: efficient k-NN queries in arbitrary subspaces

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Electrostatic field framework for supervised and semi-supervised learning from incomplete data

Natural Computing: an international journal
Applying instance-based techniques to prediction of final outcome in acute stroke

Artificial Intelligence in Medicine
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
On the equivalence of PLSI and projected clustering

ACM SIGMOD Record
Context-aware hybrid reasoning framework for pervasive healthcare

Personal and Ubiquitous Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years, the detrimental effects of the curse of high dimensionality have been studied in great detail on several problems such as clustering, nearest neighbor search, and indexing. In high dimensional space the data becomes sparse, and traditional indexing and algorithmic techniques fail from the performance perspective. Recent research results show that in high dimensional space, the concept of proximity may not even be qualitatively meaningful [6]. In this paper, we try to outline the effects of generalizing low dimensional techniques to high dimensional applications and the natural effects of sparsity on distance based applications. We outline the guidelines required in order to re-design either the distance functions or the distance-based applications in a meaningful way for high dimensional domains. We provide novel perspectives and insights on some new lines of work for broadening application definitions in order to effectively deal with the dimensionality curse.