BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Finding generalized projected clusters in high dimensional spaces
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Document clustering using word clusters via the information bottleneck method
SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Information Retrieval
Clustering Algorithms
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Efficient and Effective Clustering Methods for Spatial Data Mining
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Computing Clusters of Correlation Connected objects
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Fully automatic cross-associations
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
CURLER: finding and visualizing nonlinear correlation clusters
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
ViVo: Visual Vocabulary Construction for Mining Biomedical Images
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Outlier-robust clustering using independent components
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Summarizing spatial data streams using ClusterHulls
Journal of Experimental Algorithmics (JEA)
Data weaving: scaling up the state-of-the-art in data clustering
Proceedings of the 17th ACM conference on Information and knowledge management
CoCo: coding cost for parameter-free outlier detection
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Data Mining and Knowledge Discovery
Entropy-based motion segmentation from a moving platform
IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
ITCH: information-theoretic cluster hierarchies
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part I
Genetic algorithm for finding cluster hierarchies
DEXA'11 Proceedings of the 22nd international conference on Database and expert systems applications - Volume Part I
Integrative parameter-free clustering of data with mixed type attributes
PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Measuring non-gaussianity by phi-transformed and fuzzy histograms
Advances in Artificial Neural Systems - Special issue on Advances in Unsupervised Learning Techniques Applied to Biosciences and Medicine
Hi-index | 0.00 |
How do we find a natural clustering of a real world point set, which contains an unknown number of clusters with different shapes, and which may be contaminated by noise? Most clustering algorithms were designed with certain assumptions (Gaussianity), they often require the user to give input parameters, and they are sensitive to noise. In this paper, we propose a robust framework for determining a natural clustering of a given data set, based on the minimum description length (MDL) principle. The proposed framework, Robust Information-theoretic Clustering (RIC), is orthogonal to any known clustering algorithm: given a preliminary clustering, RIC purifies these clusters from noise, and adjusts the clusterings such that it simultaneously determines the most natural amount and shape (subspace) of the clusters. Our RIC method can be combined with any clustering technique ranging from K-means and K-medoids to advanced methods such as spectral clustering. In fact, RIC is even able to purify and improve an initial coarse clustering, even if we start with very simple methods such as grid-based space partitioning. Moreover, RIC scales well with the data set size. Extensive experiments on synthetic and real world data sets validate the proposed RIC framework.