Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
Journal of Computational and Applied Mathematics
Free Bits, PCPs, and Nonapproximability---Towards Tight Results
SIAM Journal on Computing
Efficient mining of emerging patterns: discovering trends and differences
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Efficient approximation algorithms for the Hamming center problem
Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Clustering transactions using large items
Proceedings of the eighth international conference on Information and knowledge management
Cluster validity methods: part I
ACM SIGMOD Record
COOLCAT: an entropy-based algorithm for categorical clustering
Proceedings of the eleventh international conference on Information and knowledge management
On Clustering Validation Techniques
Journal of Intelligent Information Systems
Cluster validation techniques for genome expression data
Signal Processing - Special issue: Genomic signal processing
Entropy-based criterion in categorical clustering
ICML '04 Proceedings of the twenty-first international conference on Machine learning
The "Best K" for entropy-based categorical data clustering
SSDBM'2005 Proceedings of the 17th international conference on Scientific and statistical database management
Mining statistically important equivalence classes and delta-discriminative emerging patterns
Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
k-ANMI: A mutual information based clustering algorithm for categorical data
Information Fusion
Minimum description length principle: generators are preferable to closed patterns
AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
IEEE Transactions on Pattern Analysis and Machine Intelligence
Mining succinct systems of minimal generators of formal concepts
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Hi-index | 0.01 |
Clustering validation is concerned with assessing the quality of clustering solutions. Since clustering is unsupervised and highly explorative, clustering validation has been an important and long standing research problem. Existing validity measures, including entropy-based and distance-based indices, have significant shortcomings. Indeed, for many datasets from the UCI repository, they fail to recognize that the expert-determined classes are the best clusters and they frequently give preference to clusterings with larger number of clusters. Their weakness reflects their inability to accurately capture intra-cluster coherence and inter-cluster separation. This paper proposes a novel Contrast Pattern based Clustering Quality index (CPCQ) for categorical data, by utilizing the quality and diversity of the contrast patterns, which contrast the clusters in given clusterings. High quality contrast patterns can serve to characterize the clusters and discriminate one cluster against the others. The CPCQ index is based on the rationale that a high-quality clustering should have many diversified high-quality contrast patterns among its clusters. The quality of individual contrast patterns is defined in terms of their length, support, and the length of their corresponding closed pattern. The quality measure concerning ''many diversified'' contrast patterns is defined in terms of the quality and diversity of some selected groups of contrast patterns with minimal overlap among contrast patterns and groups in terms of items and matching transactions. Experiments show that the CPCQ index (1) does not require a user to provide a distance function; (2) does not give inappropriate preference to larger number of clusters; (3) can recognize that expert-determined classes are the best clusters for many datasets from the UCI repository.