Semi-supervised learning for blog classification

  • Authors:
  • Daisuke Ikeda;Hiroya Takamura;Manabu Okumura

  • Affiliations:
  • Department of Computational, Intelligence and Systems Science, Tokyo Institute of Technology, Yokohama, Kanagawa, Japan;Precision and Intelligence Laboratory, Tokyo Institute of Technology, Yokohama, Kanagawa, Japan;Precision and Intelligence Laboratory, Tokyo Institute of Technology, Yokohama, Kanagawa, Japan

  • Venue:
  • AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Blog classification (e.g., identifying bloggers' gender or age) is one of the most interesting current problems in blog analysis. Although this problem is usually solved by applying supervised learning techniques, the large labeled dataset required for training is not always available. In contrast, unlabeled blogs can easily be collected from the web. Therefore, a semi-supervised learning method for blog classification, effectively using unlabeled data, is proposed. In this method, entries from the same blog are assumed to have the same characteristics. With this assumption, the proposed method captures the characteristics of each blog, such as writing style and topic, and uses these characteristics to improve the classification accuracy.