Clustering aggregation

Authors:
Aristides Gionis;Heikki Mannila;Panayiotis Tsaparas
Affiliations:
Yahoo! Research Labs, Barcelona, Spain;University of Helsinki and Helsinki University of Technology;Microsoft Search Labs
Venue:
ACM Transactions on Knowledge Discovery from Data (TKDD)
Year:
2007

Citing 16
Cited 43

Algorithms for clustering data

Algorithms for clustering data
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Rank aggregation methods for the Web

Proceedings of the 10th international conference on World Wide Web
Principles of data mining

Principles of data mining
Model selection for probabilistic clustering using cross-validatedlikelihood

Statistics and Computing
Comparing top k lists

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Data Clustering Using Evidence Accumulation

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Clustering with Qualitative Information

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Correlation Clustering: maximizing agreements via semidefinite programming

SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Correlation Clustering

Machine Learning
Combining multiple clustering systems

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Aggregating inconsistent information: ranking and clustering

Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Data Mining: Concepts and Techniques

Data Mining: Concepts and Techniques
Aggregating time partitions

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Correlation clustering in general weighted graphs

Theoretical Computer Science - Approximation and online algorithms

Using Global Optimization to Explore Multiple Solutions of Clustering Problems

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part III
Robust Clustering by Aggregation and Intersection Methods

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part III
Solution stability in linear programming relaxations: graph partitioning and unsupervised learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Heterogeneous source consensus learning via decision propagation and negotiation

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Interactive Visualization Tools for Meta-Clustering

Proceedings of the 2009 conference on New Directions in Neural Networks: 18th Italian Workshop on Neural Networks: WIRN 2008
Metaclustering and Consensus Algorithms for Interactive Data Analysis and Validation

WILF '09 Proceedings of the 8th International Workshop on Fuzzy Logic and Applications
Deterministic Pivoting Algorithms for Constrained Ranking and Clustering Problems

Mathematics of Operations Research
Bounding and comparing methods for correlation clustering beyond ILP

ILP '09 Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing
Multiple data structure discovery through global optimisation, meta clustering and consensus methods

International Journal of Knowledge Engineering and Soft Data Paradigms
Fragment-based clustering ensembles

Proceedings of the 18th ACM conference on Information and knowledge management
Using Data Mining Techniques to Support the Creation of Competence Ontologies

Proceedings of the 2009 conference on Artificial Intelligence in Education: Building Learning Systems that Care: From Knowledge Representation to Affective Modelling
A graph-theoretical clustering method based on two rounds of minimum spanning trees

Pattern Recognition
Global optimization, meta clustering and consensus clustering for class prediction

IJCNN'09 Proceedings of the 2009 international joint conference on Neural Networks
Application notes: data mining in cancer research

IEEE Computational Intelligence Magazine
Improved consensus clustering via linear programming

ACSC '10 Proceedings of the Thirty-Third Australasian Conferenc on Computer Science - Volume 102
A polynomial time approximation scheme for k-consensus clustering

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Multiple clustering solutions analysis through least-squares consensus algorithms

CIBB'09 Proceedings of the 6th international conference on Computational intelligence methods for bioinformatics and biostatistics
Continuous summarization of co-evolving data in large water distribution network

WAIM'10 Proceedings of the 11th international conference on Web-age information management
Exploring the performance limit of cluster ensemble techniques

SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition
Towards a more discriminative and semantic visual vocabulary

Computer Vision and Image Understanding
Minimum spanning tree based split-and-merge: A hierarchical clustering method

Information Sciences: an International Journal
Advancing data clustering via projective clustering ensembles

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Beyond classical consensus clustering: The least squares approach to multiple solutions

Pattern Recognition Letters
Visual word aggregation

IbPRIA'11 Proceedings of the 5th Iberian conference on Pattern recognition and image analysis
Clustering aggregation for improving ant based clustering

ICSI'11 Proceedings of the Second international conference on Advances in swarm intelligence - Volume Part I
Fitting Tree Metrics: Hierarchical Clustering and Phylogeny

SIAM Journal on Computing
Positional and confidence voting-based consensus functions for fuzzy cluster ensembles

Fuzzy Sets and Systems
Generalized Adjusted Rand Indices for cluster ensembles

Pattern Recognition
Privileged information for data clustering

Information Sciences: an International Journal
On the parameterized complexity of consensus clustering

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Cluster ensembles via weighted graph regularized nonnegative matrix factorization

ADMA'11 Proceedings of the 7th international conference on Advanced Data Mining and Applications - Volume Part I
A genetic graph-based clustering algorithm

IDEAL'12 Proceedings of the 13th international conference on Intelligent Data Engineering and Automated Learning
Cluster ensembles

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Routing state distance: a path-based metric for network analysis

Proceedings of the 2012 ACM conference on Internet measurement conference
Projective clustering ensembles

Data Mining and Knowledge Discovery
Triangular kernel nearest-neighbor-based clustering algorithm for discovering true clusters

PAKDD'12 Proceedings of the 2012 Pacific-Asia conference on Emerging Trends in Knowledge Discovery and Data Mining
Fast parameterless density-based clustering via random projections

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
New cluster ensemble approach to integrative biological data analysis

International Journal of Data Mining and Bioinformatics
DUET: integration of dynamic and static analyses for malware clustering with cluster ensembles

Proceedings of the 29th Annual Computer Security Applications Conference
A self-supervised framework for clustering ensemble

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
A theoretic framework of K-means-based consensus clustering

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Ensembles for unsupervised outlier detection: challenges and research questions a position paper

ACM SIGKDD Explorations Newsletter
Estimating the predominant number of clusters in a dataset

Intelligent Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the following problem: given a set of clusterings, find a single clustering that agrees as much as possible with the input clusterings. This problem, clustering aggregation, appears naturally in various contexts. For example, clustering categorical data is an instance of the clustering aggregation problem; each categorical attribute can be viewed as a clustering of the input rows where rows are grouped together if they take the same value on that attribute. Clustering aggregation can also be used as a metaclustering method to improve the robustness of clustering by combining the output of multiple algorithms. Furthermore, the problem formulation does not require a priori information about the number of clusters; it is naturally determined by the optimization function. In this article, we give a formal statement of the clustering aggregation problem, and we propose a number of algorithms. Our algorithms make use of the connection between clustering aggregation and the problem of correlation clustering. Although the problems we consider are NP-hard, for several of our methods, we provide theoretical guarantees on the quality of the solutions. Our work provides the best deterministic approximation algorithm for the variation of the correlation clustering problem we consider. We also show how sampling can be used to scale the algorithms for large datasets. We give an extensive empirical evaluation demonstrating the usefulness of the problem and of the solutions.