Outlier-robust clustering using independent components

Authors:
Christian Böhm;Christos Faloutsos;Claudia Plant
Affiliations:
University of Munich, Munich, Germany;Carnegie Mellon University, Pittsburgh, PA, USA;Technical University of Munich, Munich, Germany
Venue:
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Year:
2008

Citing 10
Cited 11

BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Computing Clusters of Correlation Connected objects

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
K-means clustering via principal component analysis

ICML '04 Proceedings of the twenty-first international conference on Machine learning
CURLER: finding and visualizing nonlinear correlation clusters

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Robust information-theoretic clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

CoCo: coding cost for parameter-free outlier detection

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Data warehouse technology by infobright

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Time series analysis with multiple resolutions

Information Systems
Clustering by synchronization

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Information-theoretic model selection for independent components

LVA/ICA'10 Proceedings of the 9th international conference on Latent variable analysis and signal separation
INCONCO: interpretable clustering of numerical and categorical objects

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier-tolerant fitting and online diagnosis of outliers in dynamic process sampling data series

AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part III
Integrative parameter-free clustering of data with mixed type attributes

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Measuring non-gaussianity by phi-transformed and fuzzy histograms

Advances in Artificial Neural Systems - Special issue on Advances in Unsupervised Learning Techniques Applied to Biosciences and Medicine
Unsupervised Similarity Learning from Textual Data

Fundamenta Informaticae - Concurrency Specification and Programming (CS&P)
Outlier Detection by Interaction with Domain Experts

Fundamenta Informaticae - To Andrzej Skowron on His 70th Birthday

Quantified Score

Hi-index	0.00

Visualization

Abstract

How can we efficiently find a clustering, i.e. a concise description of the cluster structure, of a given data set which contains an unknown number of clusters of different shape and distribution and is contaminated by noise? Most existing clustering methods are restricted to the Gaussian cluster model and are very sensitive to noise. If the cluster content follows a non-Gaussian distribution and/or the data set contains a few outliers belonging to no cluster, then the computed data distribution does not match well the true data distribution, or an unnaturally high number of clusters is required to represent the true data distribution of the data set. In this paper we propose OCI (Outlier-robust Clustering using Independent Components), a clustering method which overcomes these problems by (1) applying the exponential power distribution (EPD) as cluster model which is a generalization of Gaussian, uniform, Laplacian and many other distribution functions, (2) applying the Independent Component Analysis (ICA) for both determining the main directions inside a cluster as well as finding split planes in a top-down clustering approach, and (3) defining an efficient and effective filter for outliers, based on EPD and ICA. Our method is parameter-free and as a top-down clustering approach very efficient. An extensive experimental evaluation shows both the accuracy of the obtained clustering result as well as the efficiency of our method.