Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results

Authors:
Ian Davidson;S. S. Ravi
Affiliations:
Department of Computer Science, The University of California - Davis, Davis, USA 95616;Department of Computer Science, University at Albany - State University of New York, Albany, USA 12222
Venue:
Data Mining and Knowledge Discovery
Year:
2009

Citing 19
Cited 10

Concrete mathematics: a foundation for computer science

Concrete mathematics: a foundation for computer science
Introduction to Algorithms

Introduction to Algorithms
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Clustering with Instance-level Constraints

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
The complexity of satisfiability problems

STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
Agglomerative Clustering for Image Segmentation

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 1 - Volume 1
Hierarchical Clustering Algorithms for Document Datasets

Data Mining and Knowledge Discovery
Probability and Computing: Randomized Algorithms and Probabilistic Analysis

Probability and Computing: Randomized Algorithms and Probabilistic Analysis
COALA: A Novel Approach for the Extraction of an Alternate Clustering of High Quality and High Dissimilarity

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
The complexity of non-hierarchical clustering with instance and cluster level constraints

Data Mining and Knowledge Discovery
Intractability and clustering with constraints

Proceedings of the 24th international conference on Machine learning
Efficient incremental constrained clustering

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Constrained Clustering: Advances in Algorithms, Theory, and Applications

Constrained Clustering: Advances in Algorithms, Theory, and Applications
Identifying and generating easy sets of constraints for clustering

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Measuring constraint-set utility for partitional clustering algorithms

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Agglomerative hierarchical clustering with constraints: theoretical and empirical results

PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Speeding-Up hierarchical agglomerative clustering in presence of expensive metrics

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

Semi-supervised parameter-free divisive hierarchical clustering of categorical data

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Clustering with relative constraints

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Incorporating SAT solvers into hierarchical clustering algorithms: an efficient and flexible approach

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Semi-supervised clustering with discriminative random fields

Pattern Recognition
SHACUN: semi-supervised hierarchical active clustering based on ranking constraints

ICDM'12 Proceedings of the 12th Industrial conference on Advances in Data Mining: applications and theoretical aspects
An overview of hierarchical and non-hierarchical algorithms of clustering for semi-supervised classification

MDAI'12 Proceedings of the 9th international conference on Modeling Decisions for Artificial Intelligence
Constrained clustering using SAT

IDA'12 Proceedings of the 11th international conference on Advances in Intelligent Data Analysis
Mining evolutionary multi-branch trees from text streams

Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining
A new semi-supervised hierarchical active clustering based on ranking constraints for analysts groupization

Applied Intelligence
Hierarchical constraints

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering with constraints is a powerful method that allows users to specify background knowledge and the expected cluster properties. Significant work has explored the incorporation of instance-level constraints into non-hierarchical clustering but not into hierarchical clustering algorithms. In this paper we present a formal complexity analysis of the problem and show that constraints can be used to not only improve the quality of the resultant dendrogram but also the efficiency of the algorithms. This is particularly important since many agglomerative style algorithms have running times that are quadratic (or faster growing) functions of the number of instances to be clustered. We present several bounds on the improvement in the running times of algorithms obtainable using constraints.