The Markov chain Monte Carlo method: an approach to approximate counting and integration
Approximation algorithms for NP-hard problems
Constrained K-means Clustering with Background Knowledge
ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Clustering with Instance-level Constraints
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Path coupling: A technique for proving rapid mixing in Markov chains
FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Intelligent clustering with instance-level constraints
Intelligent clustering with instance-level constraints
A probabilistic framework for semi-supervised clustering
Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning a Mahalanobis Metric from Equivalence Constraints
The Journal of Machine Learning Research
The complexity of non-hierarchical clustering with instance and cluster level constraints
Data Mining and Knowledge Discovery
From sampling to model counting
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Measuring constraint-set utility for partitional clustering algorithms
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Hi-index | 0.00 |
Most algorithm work in data mining focuses on designing algorithms to address a learning problem. Here we focus our attention on designing algorithms to determine the ease or difficulty of a problem instance. The area of clustering under constraints has recently received much attention in the data mining community. We can view the constraints as restricting (either directly or indirectly) the search space of a clustering algorithm to just feasible clusterings. However, to our knowledge no work explores methods to count the feasible clusterings or other measures of difficulty nor the importance of these measures. We present two approaches to efficiently characterize the difficulty of satisfying must-link (ML) and cannot-link (CL) constraints: calculating the fractional chromatic polynomial of the constraint graph using LP and approximately counting the number of feasible clusterings using MCMC samplers. We show that these measures are correlated to the classical performance measures of constrained clustering algorithms. From these insights and our algorithms we construct new methods of generating and pruning constraints and empirically demonstrate their usefulness.