Subspace Selection for Clustering High-Dimensional Data

Authors:
Christian Baumgartner;Claudia Plant;Karin Kailing;Hans-Peter Kriegel;Peer Kroger
Affiliations:
University for Health Sciences, Austria;University for Health Sciences, Austria;University of Munich, Germany;University of Munich, Germany;University of Munich, Germany
Venue:
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Year:
2004

Citing 0
Cited 12

A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Clustering high dimensional data: A graph-based relaxed optimization approach

Information Sciences: an International Journal
Dimensionality reduction for heterogeneous dataset in rushes editing

Pattern Recognition
Heidi matrix: nearest neighbor driven high dimensional data visualization

Proceedings of the ACM SIGKDD Workshop on Visual Analytics and Knowledge Discovery: Integrating Automated Analysis with Interactive Exploration
Subspace sums for extracting non-random data from massive noise

Knowledge and Information Systems
Enhanced visual separation of clusters by M-mapping to facilitate cluster analysis

VISUAL'07 Proceedings of the 9th international conference on Advances in visual information systems
Mining representative subspace clusters in high-dimensional data

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
A grid-based clustering algorithm for high-dimensional data streams

ADMA'05 Proceedings of the First international conference on Advanced Data Mining and Applications
Dynamic parallelization of grid–enabled web services

EGC'05 Proceedings of the 2005 European conference on Advances in Grid Computing
A grid-based subspace clustering algorithm for high-dimensional data streams

WISE'06 Proceedings of the 7th international conference on Web Information Systems
Interactive data mining with 3D-parallel-coordinate-trees

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Short communication: Algorithm to determine ε-distance parameter in density based clustering

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

In high-dimensional feature spaces traditional clustering algorithms tend to break down in terms of efficiency and quality. Nevertheless, the data sets often contain clusters which are hidden in various subspaces of the original feature space. In this paper, we present a feature selection technique called SURFING (SUbspaces Relevant For clusterING) that finds all subspaces interesting for clustering and sorts them by relevance. The sorting is based on a quality criterion for the interestingness of a subspace using the k-nearest neighbor distances of the objects. As our method is more or less parameterless, it addresses the unsupervised notion of the data mining task "clustering" in a best possible way. A broad evaluation based on synthetic and real-world data sets demonstrates that SURFING is suitable to find all relevant subspaces in high dimensional, sparse data sets and produces better results than comparative methods.