SSC: statistical subspace clustering

Authors:
Laurent Candillier;Isabelle Tellier;Fabien Torre;Olivier Bousquet
Affiliations:
GRAppA, Charles de Gaulle University, Lille 3;GRAppA, Charles de Gaulle University, Lille 3;GRAppA, Charles de Gaulle University, Lille 3;Pertinence, Paris
Venue:
MLDM'05 Proceedings of the 4th international conference on Machine Learning and Data Mining in Pattern Recognition
Year:
2005

Citing 4
Cited 6

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mixtures of Rectangles: Interpretable Soft Clustering

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning

Comparing State-of-the-Art Collaborative Filtering Systems

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Clust-XPaths: clustering of XML paths

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Cascade evaluation of clustering algorithms

ECML'06 Proceedings of the 17th European conference on Machine Learning
A flexible structured-based representation for XML document mining

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Transforming XML trees for efficient classification and clustering

INEX'05 Proceedings of the 4th international conference on Initiative for the Evaluation of XML Retrieval
Post-processing strategies for improving local gene expression pattern analysis

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Subspace clustering is an extension of traditional clustering that seeks to find clusters in different subspaces within a dataset. This is a particularly important challenge with high dimensional data where the curse of dimensionality occurs. It has also the benefit of providing smaller descriptions of the clusters found. Existing methods only consider numerical databases and do not propose any method for clusters visualization. Besides, they require some input parameters difficult to set for the user. The aim of this paper is to propose a new subspace clustering algorithm, able to tackle databases that may contain continuous as well as discrete attributes, requiring as few user parameters as possible, and producing an interpretable output. We present a method based on the use of the well-known EM algorithm on a probabilistic model designed under some specific hypotheses, allowing us to present the result as a set of rules, each one defined with as few relevant dimensions as possible. Experiments, conducted on artificial as well as real databases, show that our algorithm gives robust results, in terms of classification and interpretability of the output.