Exploiting unlabeled data to enhance ensemble diversity

  • Authors:
  • Min-Ling Zhang;Zhi-Hua Zhou

  • Affiliations:
  • School of Computer Science and Engineering, Southeast University, Nanjing, China 210096 and National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 210093;National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China 210093

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Ensemble learning learns from the training data by generating an ensemble of multiple base learners. It is well-known that to construct a good ensemble with strong generalization ability, the base learners are deemed to be accurate as well as diverse. In this paper, unlabeled data is exploited to facilitate ensemble learning by helping augment the diversity among the base learners. Specifically, a semi-supervised ensemble method named udeed, i.e. Unlabeled Data to Enhance Ensemble Diversity, is proposed. In contrast to existing semi-supervised ensemble methods which utilize unlabeled data by estimating error-prone pseudo-labels on them to enlarge the labeled data to improve base learners' accuracies, udeed works by maximizing accuracies of base learners on labeled data while maximizing diversity among them on unlabeled data. Extensive experiments on 20 regular-scale and five large-scale data sets are conducted under the setting of either few or abundant labeled data. Experimental results show that udeed can effectively utilize unlabeled data for ensemble learning via diversity augmentation, and is highly competitive to well-established semi-supervised ensemble methods.