Leveraging label-independent features for classification in sparsely labeled networks: an empirical study

  • Authors:
  • Brian Gallagher;Tina Eliassi-Rad

  • Affiliations:
  • Lawrence Livermore National Laboratory, Livermore, CA;Lawrence Livermore National Laboratory, Livermore, CA

  • Venue:
  • SNAKDD'08 Proceedings of the Second international conference on Advances in social network mining and analysis
  • Year:
  • 2008

Quantified Score

Hi-index 0.01

Visualization

Abstract

We address the problem of within-network classification in sparsely labeled networks. Recent work has demonstrated success with statistical relational learning (SRL) and semi-supervised learning (SSL) on such problems. However, both approaches rely on the availability of labeled nodes to infer the values of missing labels. When few labels are available, the performance of these approaches can degrade. In addition, many such approaches are sensitive to the specific set of nodes labeled. So, although average performance may be acceptable, the performance on a specific task may not. We explore a complimentary approach to within-network classification, based on the use of label-independent (LI) features - i.e., features calculated without using the values of class labels. While previous work has made some use of LI features, the effects of these features on classification performance have not been extensively studied. Here, we present an empirical study in order to better understand these effects. Through experiments on several real-world data sets, we show that the use of LI features produces classifiers that are less sensitive to specific label assignments and can lead to performance improvements of over 40% for both SRL- and SSL-based classifiers. We also examine the relative utility of individual LI features; and show that, in many cases, it is a combination of a few diverse network-based structural characteristics that is most informative.