A Fast Method for Property Prediction in Graph-Structured Data from Positive and Unlabelled Examples

  • Authors:
  • Susanne Hoche;Peter Flach;David Hardcastle

  • Affiliations:
  • University of Bristol, Department of Computer Science, UK, email: hoche@cs.bris.ac.uk;University of Bristol, Department of Computer Science, UK, email: Peter.Flach@cs.bris.ac.uk;University of Bristol, Department of Computer Science, UK, email: Hardcastle.David@yahoo.co.uk

  • Venue:
  • Proceedings of the 2008 conference on ECAI 2008: 18th European Conference on Artificial Intelligence
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The analysis of large and complex networks, or graphs, is becoming increasingly important in many scientific areas including machine learning, social network analysis and bioinformatics. One natural type of question that can be asked in network analysis is “Given two sets R and T of individuals in a graph with complete and missing knowledge, respectively, about a property of interest, which individuals in T are closest to R with respect to this property?”. To answer this question, we can rank the individuals in T such that the individuals ranked highest are most likely to exhibit the property of interest. Several methods based on weighted paths in the graph and Markov chain models have been proposed to solve this task. In this paper, we show that we can improve previously published approaches by rephrasing this problem as the task of property prediction in graph-structured data from positive examples, the individuals in R, and unlabelled data, the individuals in T, and applying an inexpensive iterative neighbourhood's majority vote based prediction algorithm (“iNMV”) to this task. We evaluate our iNMV prediction algorithm and two previously proposed methods using Markov chains on three real world graphs in terms of ROC AUC statistic. iNMV obtains rankings that are either significantly better or not significantly worse than the rankings obtained from the more complex Markov chain based algorithms, while achieving a reduction in run time of one order of magnitude on large graphs.