Adaptive evidence accumulation clustering using the confidence of the objects' assignments

  • Authors:
  • João M. M. Duarte;Ana L. N. Fred;F. Jorge F. Duarte

  • Affiliations:
  • GECAD - Knowledge Engineering and Decision Support Group, Institute of Engineering, Polytechnic of Porto (ISEP/IPP), Porto, Portugal,Instituto de Telecomunicações, Instituto Superior T&# ...;Instituto de Telecomunicações, Instituto Superior Técnico, Lisboa, Portugal;GECAD - Knowledge Engineering and Decision Support Group, Institute of Engineering, Polytechnic of Porto (ISEP/IPP), Porto, Portugal

  • Venue:
  • PAKDD'12 Proceedings of the 2012 Pacific-Asia conference on Emerging Trends in Knowledge Discovery and Data Mining
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Ensemble methods are known to increase the performance of learning algorithms, both on supervised and unsupervised learning. Boosting algorithms are quite successful in supervised ensemble methods. These algorithms build incrementally an ensemble of classifiers by focusing on objects previously misclassified while training the current classifier. In this paper we propose an extension to the Evidence Accumulation Clustering method inspired by the Boosting algorithms. While on supervised learning the identification of misclassified objects is a trivial task because the labels for each object are known, on unsupervised learning these are unknown, making it difficult to identify the objects on which the clustering algorithm should focus. The proposed approach uses the information contained in the co-association matrix to identify degrees of confidence of the assignments of each object to its cluster. The degree of confidence is then used to select which objects should be emphasized in the learning process of the clustering algorithm. New consensus partition validity measures, based on the notion of degree of confidence, are also proposed. In order to evaluate the performance of our approaches, experiments on several artificial and real data sets were performed and shown the adaptive clustering ensemble method and the consensus partition validity measure help to improve the quality of data clustering.