Similarity measures in scientometric research: the Jaccard index versus Salton's cosine formula
Information Processing and Management: an International Journal
Multilevel hypergraph partitioning: application in VLSI domain
DAC '97 Proceedings of the 34th annual Design Automation Conference
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Why so many clustering algorithms: a position paper
ACM SIGKDD Explorations Newsletter
Class-Driven Statistical Discretization of Continuous Attributes (Extended Abstract)
ECML '95 Proceedings of the 8th European Conference on Machine Learning
Dynamic Discretization of Continuous Attributes
IBERAMIA '98 Proceedings of the 6th Ibero-American Conference on AI: Progress in Artificial Intelligence
SAINT '03 Proceedings of the 2003 Symposium on Applications and the Internet
A framework for diagnosing changes in evolving data streams
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
The Journal of Machine Learning Research
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Analysis of Consensus Partition in Cluster Ensemble
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science)
Clustering For Data Mining: A Data Recovery Approach (Chapman & Hall/Crc Computer Science)
Combining Multiple Clusterings Using Evidence Accumulation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering Ensembles: Models of Consensus and Weak Partitions
IEEE Transactions on Pattern Analysis and Machine Intelligence
Comparing clusterings: an axiomatic view
ICML '05 Proceedings of the 22nd international conference on Machine learning
A new Mallows distance based metric for comparing clusterings
ICML '05 Proceedings of the 22nd international conference on Machine learning
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
A framework for clustering evolving data streams
VLDB '03 Proceedings of the 29th international conference on Very large data bases - Volume 29
Identifying and generating easy sets of constraints for clustering
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Agglomerative hierarchical clustering with constraints: theoretical and empirical results
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Clustering similarity comparison using density profiles
AI'06 Proceedings of the 19th Australian joint conference on Artificial Intelligence: advances in Artificial Intelligence
OTM'11 Proceedings of the 2011th Confederated international conference on On the move to meaningful internet systems - Volume Part II
ciForager: Incrementally discovering regions of correlated change in evolving graphs
ACM Transactions on Knowledge Discovery from Data (TKDD)
Optimising sum-of-squares measures for clustering multisets defined over a metric space
Discrete Applied Mathematics
Hi-index | 0.00 |
Data clustering is a fundamental and very popular method of data analysis. Its subjective nature, however, means that different clustering algorithms or different parameter settings can produce widely varying and sometimes conflicting results. This has led to the use of clustering comparison measures to quantify the degree of similarity between alternative clusterings. Existing measures, though, can be limited in their ability to assess similarity and sometimes generate unintuitive results. They also cannot be applied to compare clusterings which contain different data points, an activity which is important for scenarios such as data stream analysis. In this paper, we introduce a new clustering similarity measure, known as ADCO, which aims to address some limitations of existing measures, by allowing greater flexibility of comparison via the use of density profiles to characterize a clustering. In particular, it adopts a `data mining style' philosophy to clustering comparison, whereby two clusterings are considered to be more similar, if they are likely to give rise to similar types of prediction models. Furthermore, we show that this new measure can be applied as a highly effective objective function within a new algorithm, known as MAXIMUS, for generating alternate clusterings.