Cool Blog Classification from Positive and Unlabeled Examples

  • Authors:
  • Kritsada Sriphaew;Hiroya Takamura;Manabu Okumura

  • Affiliations:
  • Precision and Intelligence Laboratory, Tokyo Institute of Technology, Yokohama, Japan 226-8503;Precision and Intelligence Laboratory, Tokyo Institute of Technology, Yokohama, Japan 226-8503;Precision and Intelligence Laboratory, Tokyo Institute of Technology, Yokohama, Japan 226-8503

  • Venue:
  • PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the problem of cool blog classification using only positive and unlabeled examples. We propose an algorithm, called PUB, that exploits the information of unlabeled data together with the positive examples to predict whether the unseen blogs are cool or not. The algorithm uses the weighting technique to assign a weight to each unlabeled example which is assumed to be negative in the training set, and the bagging technique to obtain several weak classifiers, each of which is learned on a small training set generated by randomly sampling some positive examples and some unlabeled examples, which are assumed to be negative. Each of the weak classifiers must achieve admissible performance measure evaluated based on the whole labeled positive examples or has the best performance measure within iteration limit. The majority voting function on all weak classifiers is employed to predict the class of a test instance. The experimental results show that PUB can correctly predict the classes of unseen blogs where this situation cannot be handled by the traditional learning from positive and negative examples. The results also show that PUB outperforms other algorithms for learning from positive and unlabeled examples in the task of cool blog classification.