Pairwise similarity for cluster ensemble problem: link-based and approximate approaches

Authors:
Natthakan Iam-On;Tossapon Boongoen
Affiliations:
School of Information Technology, Mae Fah Luang University, Chiang Rai, Thailand;Department of Mathematics and Computer Science, Royal Thai Air Force Academy, Bangkok, Thailand
Venue:
Transactions on Large-Scale Data- and Knowledge-centered systems IX
Year:
2013

Citing 26
Cited 0

Multilevel k-way partitioning scheme for irregular graphs

Journal of Parallel and Distributed Computing
Multilevel hypergraph partitioning: applications in VLSI domain

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Data clustering: a review

ACM Computing Surveys (CSUR)
Density biased sampling: an improved method for data mining and clustering

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Finding Consistent Clusters in Data Partitions

MCS '01 Proceedings of the Second International Workshop on Multiple Classifier Systems
SimRank: a measure of structural-context similarity

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data

Machine Learning
Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets

IEEE Transactions on Knowledge and Data Engineering
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Combining Multiple Weak Clusterings

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Solving cluster ensemble problems by bipartite graph partitioning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Combining multiple clustering systems

PKDD '04 Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases
Clustering Aggregation

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Combining Multiple Clusterings Using Evidence Accumulation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Link-based similarity measures for the classification of Web documents

Journal of the American Society for Information Science and Technology
Learning Pairwise Similarity for Data Clustering

ICPR '06 Proceedings of the 18th International Conference on Pattern Recognition - Volume 01
Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization

IEEE Transactions on Pattern Analysis and Machine Intelligence
A Method of Clustering Combination Applied to Satellite Image Analysis

ICIAP '07 Proceedings of the 14th International Conference on Image Analysis and Processing
Discriminatively regularized least-squares classification

Pattern Recognition
A Density-Biased Sampling Technique to Improve Cluster Representativeness

PKDD 2007 Proceedings of the 11th European conference on Principles and Practice of Knowledge Discovery in Databases
Consensus Clusterings

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Weighted cluster ensembles: Methods and analysis

ACM Transactions on Knowledge Discovery from Data (TKDD)
Refining Pairwise Similarity Matrix for Cluster Ensemble Problem with Cluster Relations

DS '08 Proceedings of the 11th International Conference on Discovery Science
Data clustering: a user’s dilemma

PReMI'05 Proceedings of the First international conference on Pattern Recognition and Machine Intelligence
Analysing social networks within bibliographical data

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cluster ensemble methods have emerged as powerful techniques, aggregating several input data clusterings to generate a single output clustering, with improved robustness and stability. In particular, link-based similarity techniques have recently been introduced with superior performance to the conventional co-association method. Their potential and applicability are, however limited due to the underlying time complexity. In light of such shortcoming, this paper presents two approximate approaches that mitigate the problem of time complexity: the approximate algorithm approach (Approximate SimRank Based Similarity matrix) and the approximate data approach (Prototype-based cluster ensemble model). The first approach involves decreasing the computational requirement of the existing link-based technique; the second reduces the size of the problem by finding a smaller, representative, approximate dataset, derived by a density-biased sampling technique. The advantages of both approximate approaches are empirically demonstrated over 22 datasets (both artificial and real data) and statistical comparisons of performance (with 95% confidence level) with three well-known validity criteria. Results obtained from these experiments suggest that approximate techniques can efficiently help scaling up the application of link-based similarity methods to wider range of data sizes.