DUSC: Dimensionality Unbiased Subspace Clustering

Authors:
Ira Assent;Ralph Krieger;Emmanuel Müller;Thomas Seidl
Affiliations:
-;-;-;-
Venue:
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Year:
2007

Citing 0
Cited 27

VISA: visual subspace clustering analysis

ACM SIGKDD Explorations Newsletter - Special issue on visual analytics
Clustering multidimensional sequences in spatial and temporal databases

Knowledge and Information Systems
Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Morpheus: interactive exploration of subspace clustering

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Pleiades: Subspace Clustering and Evaluation

ECML PKDD '08 Proceedings of the European conference on Machine Learning and Knowledge Discovery in Databases - Part II
EDSC: efficient density-based subspace clustering

Proceedings of the 17th ACM conference on Information and knowledge management
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
HSM: Heterogeneous Subspace Mining in High Dimensional Data

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Detection of orthogonal concepts in subspaces of high dimensional data

Proceedings of the 18th ACM conference on Information and knowledge management
Subspace and projected clustering: experimental evaluation and analysis

Knowledge and Information Systems
Evaluating clustering in subspace projections of high dimensional data

Proceedings of the VLDB Endowment
Projected Gustafson Kessel Clustering

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
SubClass: classification of multidimensional noisy data using subspace clusters

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Mining representative subspace clusters in high-dimensional data

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Adaptive outlierness for subspace outlier ranking

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
An unbiased distance-based outlier detection approach for high-dimensional data

DASFAA'11 Proceedings of the 16th international conference on Database systems for advanced applications - Volume Part I
Projected Gustafson-Kessel clustering algorithm and its convergence

Transactions on rough sets XIV
Efficient selectivity estimation by histogram construction based on subspace clustering

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Scalable density-based subspace clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
External evaluation measures for subspace clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
Exploiting constraint inconsistence for dimension selection in subspace clustering: A semi-supervised approach

Neurocomputing
Subspace clustering

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Clustering high dimensional data

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
A survey on enhanced subspace clustering

Data Mining and Knowledge Discovery
Stochastic subspace search for top-k multi-view clustering

Proceedings of the 4th MultiClust Workshop on Multiple Clusterings, Multi-view Data, and Multi-source Knowledge-driven Clustering
Finding multiple global linear correlations in sparse and noisy data sets

Knowledge-Based Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

To gain insight into today's large data resources, data mining provides automatic aggregation techniques. Clustering aims at grouping data such that objects within groups are similar while objects in different groups are dissimilar. In scenarios with many attributes or with noise, clusters are often hidden in subspaces of the data and do not show up in the full dimensional space. For these applications, subspace clustering methods aim at detecting clusters in any subspace. Existing subspace clustering approaches fall prey to an effect we call dimensionality bias. As dimensionality of subspaces varies, approaches which do not take this effect into account fail to separate clusters from noise. We give a formal definition of dimensionality bias and analyze consequences for subspace clustering. A dimensionality unbiased subspace clustering (DUSC) definition based on statistical foundations is proposed. In thorough experiments on synthetic and real world data, we show that our approach outperforms existing subspace clustering algorithms.