Semi-Supervised Classification of Network Data Using Very Few Labels

  • Authors:
  • Frank Lin;William W. Cohen

  • Affiliations:
  • -;-

  • Venue:
  • ASONAM '10 Proceedings of the 2010 International Conference on Advances in Social Networks Analysis and Mining
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The goal of semi-supervised learning (SSL) methods is to reduce the amount of labeled training data required by learning from both labeled and unlabeled instances. Macskassy and Provost (2007) proposed the weighted-vote relational neighbor classifier (wvRN) as a simple yet effective baseline for semi-supervised learning on network data. It is similar to many recent graph-based SSL methods and is shown to be essentially the same as the Gaussian-field harmonic functions classifier proposed by Zhu et al. (2003) and proves to be very effective on some benchmark network datasets. We describe another simple and intuitive semi-supervised learning method based on random graph walk that outperforms wvRN by a large margin on several benchmark datasets when very few labels are available. Additionally, we show that using authoritative instances as training seeds --- instances that arguably cost much less to label --- dramatically reduces the amount of labeled data required to achieve the same classification accuracy. For some existing state-of-the-art semi-supervised learning methods the labeled data needed is reduced by a factor of 50.