Non-redundant clustering with conditional ensembles

Authors:
David Gondek;Thomas Hofmann
Affiliations:
IBM T. J. Watson Research Center, Hawthorne, NY;Fraunhofer IPSI, Darmstadt, Germany
Venue:
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Year:
2005

Citing 12
Cited 10

A survey of constrained classification

Computational Statistics & Data Analysis
Learning to extract symbolic knowledge from the World Wide Web

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence
Reinterpreting the Category Utility Function

Machine Learning
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Clustering with Instance-level Constraints

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Combining Multiple Weak Clusterings

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Ensembles of Partitions via Data Resampling

ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
Integrating constraints and metric learning in semi-supervised clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Non-Redundant Data Clustering

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Analysis of Consensus Partition in Cluster Ensemble

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
An information-theoretic external cluster-validity measure

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters

IEEE Transactions on Pattern Analysis and Machine Intelligence
Topic Extraction with AGAPE

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Weighted cluster ensembles: Methods and analysis

ACM Transactions on Knowledge Discovery from Data (TKDD)
Resampling-based selective clustering ensembles

Pattern Recognition Letters
A novel hierarchical-clustering-combination scheme based on fuzzy-similarity relations

IEEE Transactions on Fuzzy Systems
The multi-view information bottleneck clustering

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Learning multiple nonredundant clusterings

ACM Transactions on Knowledge Discovery from Data (TKDD)
Nonparametric Bayesian clustering ensembles

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Subspace clustering

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
How to "alternatize" a clustering algorithm

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data may often contain multiple plausible clusterings. In order to discover a clustering which is useful to the user, constrained clustering techniques have been proposed to guide the search. Typically, these techniques assume background knowledge in the form of explicit information about the desired clustering. In contrast, we consider the setting in which the background knowledge is instead about an undesired clustering. Such knowledge may be obtained from an existing classification or precedent algorithm. The problem is then to find a novel, "orthogonal" clustering in the data. We present a general algorithmic framework which makes use of cluster ensemble methods to solve this problem. One key advantage of this approach is that it takes a base clustering method which is used as a black box, allowing the practitioner to select the most appropriate clustering method for the domain. We present experimental results on synthetic and text data which establish the competitiveness of this framework.