A clustering comparison measure using density profiles and its application to the discovery of alternate clusterings

Authors:
Eric Bae;James Bailey;Guozhu Dong
Affiliations:
NICTA Victoria Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Melbourne, Australia;NICTA Victoria Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Parkville, Melbourne, Australia;Department of Computer Science and Engineering, Wright State University, Dayton, USA
Venue:
Data Mining and Knowledge Discovery
Year:
2010

Citing 24
Cited 4

Similarity measures in scientometric research: the Jaccard index versus Salton's cosine formula

Information Processing and Management: an International Journal
On the Handling of Continuous-Valued Attributes in Decision Tree Generation

Machine Learning
Multilevel hypergraph partitioning: application in VLSI domain

DAC '97 Proceedings of the 34th annual Design Automation Conference
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Why so many clustering algorithms: a position paper

ACM SIGKDD Explorations Newsletter
Class-Driven Statistical Discretization of Continuous Attributes (Extended Abstract)

ECML '95 Proceedings of the 8th European Conference on Machine Learning
Dynamic Discretization of Continuous Attributes

IBERAMIA '98 Proceedings of the 6th Ibero-American Conference on AI: Progress in Artificial Intelligence
Identifying Important Features for Intrusion Detection Using Support Vector Machines and Neural Networks

SAINT '03 Proceedings of the 2003 Symposium on Applications and the Internet
A framework for diagnosing changes in evolving data streams

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Non-Redundant Data Clustering

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Analysis of Consensus Partition in Cluster Ensemble

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science)

Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science)
Combining Multiple Clusterings Using Evidence Accumulation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering Ensembles: Models of Consensus and Weak Partitions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Comparing clusterings: an axiomatic view

ICML '05 Proceedings of the 22nd international conference on Machine learning
A new Mallows distance based metric for comparing clusterings

ICML '05 Proceedings of the 22nd international conference on Machine learning
COALA: A Novel Approach for the Extraction of an Alternate Clustering of High Quality and High Dissimilarity

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Meta Clustering

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
A framework for clustering evolving data streams

VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Discretization for naive-Bayes learning: managing discretization bias and variance

Machine Learning
Identifying and generating easy sets of constraints for clustering

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Agglomerative hierarchical clustering with constraints: theoretical and empirical results

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Clustering similarity comparison using density profiles

AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence

Breaking the deadlock: simultaneously discovering attribute matching and cluster matching with multi-objective simulated annealing

OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part II
Sharing and integration of cognitive neuroscience data: Metric and pattern matching across heterogeneous ERP datasets

Neurocomputing
ciForager: Incrementally discovering regions of correlated change in evolving graphs

ACM Transactions on Knowledge Discovery from Data (TKDD)
Optimising sum-of-squares measures for clustering multisets defined over a metric space

Discrete Applied Mathematics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data clustering is a fundamental and very popular method of data analysis. Its subjective nature, however, means that different clustering algorithms or different parameter settings can produce widely varying and sometimes conflicting results. This has led to the use of clustering comparison measures to quantify the degree of similarity between alternative clusterings. Existing measures, though, can be limited in their ability to assess similarity and sometimes generate unintuitive results. They also cannot be applied to compare clusterings which contain different data points, an activity which is important for scenarios such as data stream analysis. In this paper, we introduce a new clustering similarity measure, known as ADCO, which aims to address some limitations of existing measures, by allowing greater flexibility of comparison via the use of density profiles to characterize a clustering. In particular, it adopts a `data mining style' philosophy to clustering comparison, whereby two clusterings are considered to be more similar, if they are likely to give rise to similar types of prediction models. Furthermore, we show that this new measure can be applied as a highly effective objective function within a new algorithm, known as MAXIMUS, for generating alternate clusterings.