COALA: A Novel Approach for the Extraction of an Alternate Clustering of High Quality and High Dissimilarity

Authors:
Eric Bae;James Bailey
Affiliations:
University of Melbourne, Australia;University of Melbourne, Australia
Venue:
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Year:
2006

Citing 0
Cited 21

Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results

Data Mining and Knowledge Discovery
A principled and flexible framework for finding alternative clusterings

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Avoiding Bias in Text Clustering Using Constrained K-means and May-Not-Links

ICTIR '09 Proceedings of the 2nd International Conference on Theory of Information Retrieval: Advances in Information Retrieval Theory
ACORN: towards automating domain specific ontology construction process

APWeb'08 Proceedings of the 10th Asia-Pacific web conference on Progress in WWW research and development
A hierarchical information theoretic technique for the discovery of non linear alternative clusterings

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Unifying dependent clustering and disparate clustering for non-homogeneous data

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Learning multiple nonredundant clusterings

ACM Transactions on Knowledge Discovery from Data (TKDD)
A clustering comparison measure using density profiles and its application to the discovery of alternate clusterings

Data Mining and Knowledge Discovery
Improving alternative text clustering quality in the avoiding bias task with spectral and flat partition algorithms

DEXA'10 Proceedings of the 21st international conference on Database and expert systems applications: Part II
An information theoretic framework for data mining

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
An experimental study of constrained clustering effectiveness in presence of erroneous constraints

Information Processing and Management: an International Journal
The instance easiness of supervised learning for cluster validity

PAKDD'11 Proceedings of the 15th international conference on New Frontiers in Applied Data Mining
Language modelling of constraints for text clustering

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
A novel approach for finding alternative clusterings using feature selection

DASFAA'12 Proceedings of the 17th international conference on Database Systems for Advanced Applications - Volume Part I
Subspace clustering

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Multi-view clustering using mixture models in subspace projections

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Model-based clustering of high-dimensional data: Variable selection versus facet determination

International Journal of Approximate Reasoning
Regularized nonnegative shared subspace learning

Data Mining and Knowledge Discovery
How to "alternatize" a clustering algorithm

Data Mining and Knowledge Discovery
Generating multiple alternative clusterings via globally optimal subspaces

Data Mining and Knowledge Discovery
Hierarchical constraints

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cluster analysis has long been a fundamental task in data mining and machine learning. However, traditional clustering methods concentrate on producing a single solution, even though multiple alternative clusterings may exist. It is thus difficult for the user to validate whether the given solution is in fact appropriate, particularly for large and complex datasets. In this paper we explore the critical requirements for systematically finding a new clustering, given that an already known clustering is available and we also propose a novel algorithm, COALA, to discover this new clustering. Our approach is driven by two important factors; dissimilarity and quality. These are especially important for finding a new clustering which is highly informative about the underlying structure of data, but is at the same time distinctively different from the provided clustering. We undertake an experimental analysis and show that our method is able to outperform existing techniques, for both synthetic and real datasets.