Clustering Aggregation

Authors:
Aristides Gionis;Heikki Mannila;Panayiotis Tsaparas
Affiliations:
University of Helsinki;University of Helsinki;University of Helsinki
Venue:
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Year:
2005

Citing 10
Cited 35

ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Rank aggregation methods for the Web

Proceedings of the 10th international conference on World Wide Web
Model selection for probabilistic clustering using cross-validatedlikelihood

Statistics and Computing
Comparing top k lists

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Correlation Clustering

FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Clustering with Qualitative Information

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Integrating Microarray Data by Consensus Clustering

ICTAI '03 Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence
Correlation Clustering: maximizing agreements via semidefinite programming

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Combining multiple clustering systems

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases

Aggregating inconsistent information: ranking and clustering

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Fitting tree metrics: Hierarchical clustering and Phylogeny

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Merging Interface Schemas on the Deep Web via Clustering Aggregation

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Comparing Subspace Clusterings

IEEE Transactions on Knowledge and Data Engineering
Programmable clustering

Proceedings of the twenty-fifth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Aggregating time partitions

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Aggregation of partial rankings, p-ratings and top-m lists

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
k-ANMI: A mutual information based clustering algorithm for categorical data

Information Fusion
On the Approximation of Correlation Clustering and Consensus Clustering

Journal of Computer and System Sciences
Discovering topical structures of databases

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Multisource images analysis using collaborative clustering

EURASIP Journal on Advances in Signal Processing
Aggregating inconsistent information: Ranking and clustering

Journal of the ACM (JACM)
Comparing Non-parametric Ensemble Methods for Document Clustering

NLDB '08 Proceedings of the 13th international conference on Natural Language and Information Systems: Applications of Natural Language to Information Systems
Refining Pairwise Similarity Matrix for Cluster Ensemble Problem with Cluster Relations

DS '08 Proceedings of the 11th International Conference on Discovery Science
A new method for hierarchical clustering combination

Intelligent Data Analysis
A scalable framework for cluster ensembles

Pattern Recognition
Change analysis in spatial datasets by interestingness comparison

SIGSPATIAL Special
Exploiting context analysis for combining multiple entity resolution systems

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A Multiple Expert Approach to the Class Imbalance Problem Using Inverse Random under Sampling

MCS '09 Proceedings of the 8th International Workshop on Multiple Classifier Systems
Correlation Clustering Revisited: The "True" Cost of Error Minimization Problems

ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
How to Control Clustering Results? Flexible Clustering Aggregation

IDA '09 Proceedings of the 8th International Symposium on Intelligent Data Analysis: Advances in Intelligent Data Analysis VIII
Music clustering with features from different information sources

IEEE Transactions on Multimedia - Special section on communities and media computing
A novel hierarchical-clustering-combination scheme based on fuzzy-similarity relations

IEEE Transactions on Fuzzy Systems
Towards a general framework for data mining

KDID'06 Proceedings of the 5th international conference on Knowledge discovery in inductive databases
Automatic malware categorization using cluster ensemble

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
On combining multiple clusterings: an overview and a new perspective

Applied Intelligence
A polygon-based methodology for mining related spatial datasets

Proceedings of the 1st ACM SIGSPATIAL International Workshop on Data Mining for Geoinformatics
Visual decision support for ensemble clustering

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Nearest-neighbor guided evaluation of data reliability and its applications

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Towards a more discriminative and semantic visual vocabulary

Computer Vision and Image Understanding
CLAP: Collaborative pattern mining for distributed information systems

Decision Support Systems
Visualizing transactional data with multiple clusterings for knowledge discovery

ISMIS'06 Proceedings of the 16th international conference on Foundations of Intelligent Systems
Empirical Evidence of the Applicability of Functional Clustering through Gene Expression Classification

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Correlation clustering with stochastic labellings

SIMBAD'13 Proceedings of the Second international conference on Similarity-Based Pattern Recognition
Pairwise similarity for cluster ensemble problem: link-based and approximate approaches

Transactions on Large-Scale Data- and Knowledge-centered systems IX

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the following problem: given a set of clusterings, find a clustering that agrees as much as possible with the given clusterings. This problem, clustering aggregation, appears naturally in various contexts. For example, clustering categorical data is an instance of the problem: each categorical variable can be viewed as a clustering of the input rows. Moreover, clustering aggregation can be used as a meta-clustering method to improve the robustness of clusterings. The problem formulation does not require a-priori information about the number of clusters, and it gives a naturalway for handlingmissing values. We give a formal statement of the clustering-aggregation problem, we discuss related work, and we suggest a number of algorithms. For several of the methods we provide theoretical guarantees on the quality of the solutions. We also show how sampling can be used to scale the algorithms for large data sets. We give an extensive empirical evaluation demonstrating the usefulness of the problem and of the solutions.