Learning Instance Weighted Naive Bayes from labeled and unlabeled data

  • Authors:
  • Liangxiao Jiang

  • Affiliations:
  • Department of Computer Science, China University of Geosciences, Wuhan, China 430074

  • Venue:
  • Journal of Intelligent Information Systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In real-world data mining applications, it is often the case that unlabeled instances are abundant, while available labeled instances are very limited. Thus, semi-supervised learning, which attempts to benefit from large amount of unlabeled data together with labeled data, has attracted much attention from researchers. In this paper, we propose a very fast and yet highly effective semi-supervised learning algorithm. We call our proposed algorithm Instance Weighted Naive Bayes (simply IWNB). IWNB firstly trains a naive Bayes using the labeled instances only. And the trained naive Bayes is used to estimate the class membership probabilities of the unlabeled instances. Then, the estimated class membership probabilities are used to label and weight unlabeled instances. At last, a naive Bayes is trained again using both the originally labeled data and the (newly labeled and weighted) unlabeled data. Our experimental results based on a large number of UCI data sets show that IWNB often improves the classification accuracy of original naive Bayes when available labeled data are very limited.