BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Finding generalized projected clusters in high dimensional spaces
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Computing Clusters of Correlation Connected objects
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
K-means clustering via principal component analysis
ICML '04 Proceedings of the twenty-first international conference on Machine learning
CURLER: finding and visualizing nonlinear correlation clusters
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Robust information-theoretic clustering
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
CoCo: coding cost for parameter-free outlier detection
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Data warehouse technology by infobright
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Time series analysis with multiple resolutions
Information Systems
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Information-theoretic model selection for independent components
LVA/ICA'10 Proceedings of the 9th international conference on Latent variable analysis and signal separation
INCONCO: interpretable clustering of numerical and categorical objects
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Outlier-tolerant fitting and online diagnosis of outliers in dynamic process sampling data series
AICI'11 Proceedings of the Third international conference on Artificial intelligence and computational intelligence - Volume Part III
Integrative parameter-free clustering of data with mixed type attributes
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Measuring non-gaussianity by phi-transformed and fuzzy histograms
Advances in Artificial Neural Systems - Special issue on Advances in Unsupervised Learning Techniques Applied to Biosciences and Medicine
Unsupervised Similarity Learning from Textual Data
Fundamenta Informaticae - Concurrency Specification and Programming (CS&P)
Outlier Detection by Interaction with Domain Experts
Fundamenta Informaticae - To Andrzej Skowron on His 70th Birthday
Hi-index | 0.00 |
How can we efficiently find a clustering, i.e. a concise description of the cluster structure, of a given data set which contains an unknown number of clusters of different shape and distribution and is contaminated by noise? Most existing clustering methods are restricted to the Gaussian cluster model and are very sensitive to noise. If the cluster content follows a non-Gaussian distribution and/or the data set contains a few outliers belonging to no cluster, then the computed data distribution does not match well the true data distribution, or an unnaturally high number of clusters is required to represent the true data distribution of the data set. In this paper we propose OCI (Outlier-robust Clustering using Independent Components), a clustering method which overcomes these problems by (1) applying the exponential power distribution (EPD) as cluster model which is a generalization of Gaussian, uniform, Laplacian and many other distribution functions, (2) applying the Independent Component Analysis (ICA) for both determining the main directions inside a cluster as well as finding split planes in a top-down clustering approach, and (3) defining an efficient and effective filter for outliers, based on EPD and ICA. Our method is parameter-free and as a top-down clustering approach very efficient. An extensive experimental evaluation shows both the accuracy of the obtained clustering result as well as the efficiency of our method.