B-EM: a classifier incorporating bootstrap with EM approach for data mining

  • Authors:
  • Xintao Wu;Jianping Fan;Kalpathi R. Subramanian

  • Affiliations:
  • UNC at Charlotte, Charlotte, NC;UNC at Charlotte, Charlotte, NC;UNC at Charlotte, Charlotte, NC

  • Venue:
  • Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper investigates the problem of augmenting labeled data with unlabeled data to improve classification accuracy. This is significant for many applications such as image classification where obtaining classification labels is expensive, while large unlabeled examples are easily available. We investigate an Expectation Maximization (EM) algorithm for learning from labeled and unlabeled data. The reason why unlabeled data boosts learning accuracy is because it provides the information about the joint probability distribution. A theoretical argument shows that the more unlabeled examples are combined in learning, the more accurate the result. We then introduce B-EM algorithm, based on the combination of EM with bootstrap method, to exploit the large unlabeled data while avoiding prohibitive I/O cost. Experimental results over both synthetic and real data sets that the proposed approach has a satisfactory performance.