Agglomerative hierarchical clustering with constraints: theoretical and empirical results

Authors:
Ian Davidson;S. S. Ravi
Affiliations:
Department of Computer Science, University at Albany – State University of New York, Albany, NY;Department of Computer Science, University at Albany – State University of New York, Albany, NY
Venue:
PKDD'05 Proceedings of the 9th European conference on Principles and Practice of Knowledge Discovery in Databases
Year:
2005

Citing 9
Cited 36

Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Clustering with Instance-level Constraints

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
On Data Clustering Analysis: Scalability, Constraints, and Validation

PAKDD '02 Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
The complexity of satisfiability problems

STOC '78 Proceedings of the tenth annual ACM symposium on Theory of computing
A Feasible Method to Find Areas with Constraints Using Hierarchical Depth-First Clustering

SSDBM '01 Proceedings of the 13th International Conference on Scientific and Statistical Database Management
Hierarchical Clustering Algorithms for Document Datasets

Data Mining and Knowledge Discovery
Speeding-Up hierarchical agglomerative clustering in presence of expensive metrics

PAKDD'05 Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining

The complexity of non-hierarchical clustering with instance and cluster level constraints

Data Mining and Knowledge Discovery
BoostCluster: boosting clustering by pairwise constraints

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Adding background knowledge to formal concept analysis via attribute dependency formulas

Proceedings of the 2008 ACM symposium on Applied computing
Clustering Trees with Instance Level Constraints

ECML '07 Proceedings of the 18th European conference on Machine Learning
A consensus based approach to constrained clustering of software requirements

Proceedings of the 17th ACM conference on Information and knowledge management
An Adaptive Multi-agent System for Continuous Learning of Streaming Data

WI-IAT '08 Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 02
Semi-supervised graph clustering: a kernel approach

Machine Learning
Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results

Data Mining and Knowledge Discovery
C-DBSCAN: Density-Based Clustering with Constraints

RSFDGrC '07 Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
An Evidence Accumulation Approach to Constrained Clustering Combination

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
A Probabilistic Approach for Constrained Clustering with Topological Map

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Self-organizing multi-agent system for adaptive continuous unsupervised learning in complex uncertain environments

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 3
Face based image navigation and search

MM '09 Proceedings of the 17th ACM international conference on Multimedia
Ant-based clustering with multiple deposited pheromones and simple ant memory

ISC '07 Proceedings of the 10th IASTED International Conference on Intelligent Systems and Control
Formal concept analysis with background knowledge: attribute priorities

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews - Special issue on information reuse and integration
Incorporating prior domain knowledge into a kernel based feature selection algorithm

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
Density-based semi-supervised clustering

Data Mining and Knowledge Discovery
A clustering comparison measure using density profiles and its application to the discovery of alternate clusterings

Data Mining and Knowledge Discovery
Boosting Clustering by Active Constraint Selection

Proceedings of the 2010 conference on ECAI 2010: 19th European Conference on Artificial Intelligence
Semi-supervised agglomerative hierarchical clustering using clusterwise tolerance based pairwise constraints

MDAI'10 Proceedings of the 7th international conference on Modeling decisions for artificial intelligence
Topic-constrained hierarchical clustering for document datasets

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications: Part I
Incorporating SAT solvers into hierarchical clustering algorithms: an efficient and flexible approach

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Graph-based clustering with constraints

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Semi-supervised agglomerative hierarchical clustering with ward method using clusterwise tolerance

MDAI'11 Proceedings of the 8th international conference on Modeling decisions for artificial intelligence
Constraint selection for semi-supervised topological clustering

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Towards constrained co-clustering in ordered 0/1 data sets

ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
Improving constrained clustering with active query selection

Pattern Recognition
Identifying patent infringement using SAO based semantic technological similarities

Scientometrics
Fiber segmentation using constrained clustering

ICMB'10 Proceedings of the Second international conference on Medical Biometrics
On the effects of constraints in semi-supervised hierarchical clustering

ANNPR'06 Proceedings of the Second international conference on Artificial Neural Networks in Pattern Recognition
Topic discovery and topic-driven clustering for audit method datasets

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part II
A semi-supervised incremental clustering algorithm for streaming data

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
LEARNING AND VERIFYING SAFETY CONSTRAINTS FOR PLANNERS IN A KNOWLEDGE-IMPOVERISHED SYSTEM

Computational Intelligence
Cross-argument inference for implicit discourse relation recognition

Proceedings of the 21st ACM international conference on Information and knowledge management
An overview of hierarchical and non-hierarchical algorithms of clustering for semi-supervised classification

MDAI'12 Proceedings of the 9th international conference on Modeling Decisions for Artificial Intelligence
Hierarchical constraints

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore the use of instance and cluster-level constraints with agglomerative hierarchical clustering. Though previous work has illustrated the benefits of using constraints for non-hierarchical clustering, their application to hierarchical clustering is not straight-forward for two primary reasons. First, some constraint combinations make the feasibility problem (Does there exist a single feasible solution?) NP-complete. Second, some constraint combinations when used with traditional agglomerative algorithms can cause the dendrogram to stop prematurely in a dead-end solution even though there exist other feasible solutions with a significantly smaller number of clusters. When constraints lead to efficiently solvable feasibility problems and standard agglomerative algorithms do not give rise to dead-end solutions, we empirically illustrate the benefits of using constraints to improve cluster purity and average distortion. Furthermore, we introduce the new γ constraint and use it in conjunction with the triangle inequality to considerably improve the efficiency of agglomerative clustering.