Pairwise similarity for cluster ensemble problem: link-based and approximate approaches

  • Authors:
  • Natthakan Iam-On;Tossapon Boongoen

  • Affiliations:
  • School of Information Technology, Mae Fah Luang University, Chiang Rai, Thailand;Department of Mathematics and Computer Science, Royal Thai Air Force Academy, Bangkok, Thailand

  • Venue:
  • Transactions on Large-Scale Data- and Knowledge-centered systems IX
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cluster ensemble methods have emerged as powerful techniques, aggregating several input data clusterings to generate a single output clustering, with improved robustness and stability. In particular, link-based similarity techniques have recently been introduced with superior performance to the conventional co-association method. Their potential and applicability are, however limited due to the underlying time complexity. In light of such shortcoming, this paper presents two approximate approaches that mitigate the problem of time complexity: the approximate algorithm approach (Approximate SimRank Based Similarity matrix) and the approximate data approach (Prototype-based cluster ensemble model). The first approach involves decreasing the computational requirement of the existing link-based technique; the second reduces the size of the problem by finding a smaller, representative, approximate dataset, derived by a density-biased sampling technique. The advantages of both approximate approaches are empirically demonstrated over 22 datasets (both artificial and real data) and statistical comparisons of performance (with 95% confidence level) with three well-known validity criteria. Results obtained from these experiments suggest that approximate techniques can efficiently help scaling up the application of link-based similarity methods to wider range of data sizes.