Semi-supervised Classification from Discriminative Random Walks

  • Authors:
  • Jérôme Callut;Kevin Françoisse;Marco Saerens;Pierre Dupont

  • Affiliations:
  • UCL Machine Learning Group (MLG), and Louvain School of Management, ISYS unit, Université catholique de Louvain, Louvain-la-Neuve, Belgium B-1348;UCL Machine Learning Group (MLG), and Louvain School of Management, ISYS unit, Université catholique de Louvain, Louvain-la-Neuve, Belgium B-1348;UCL Machine Learning Group (MLG), and Louvain School of Management, ISYS unit, Université catholique de Louvain, Louvain-la-Neuve, Belgium B-1348;UCL Machine Learning Group (MLG), and Department of Computing Science and Engineering, INGI, Université catholique de Louvain, Louvain-la-Neuve, Belgium B-1348

  • Venue:
  • ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a novel technique, called $\mathcal{D}$-walks, to tackle semi-supervised classification problems in large graphs. We introduce here a betweenness measure based on passage times during random walks of bounded lengths. Such walks are further constrained to start and end in nodes within the same class, defining a distinct betweenness for each class. Unlabeled nodes are classified according to the class showing the highest betweenness. Forward and backward recurrences are derived to efficiently compute the passage times. $\mathcal{D}$-walks can deal with directed or undirected graphs with a linear time complexity with respect to the number of edges, the maximum walk length considered and the number of classes. Experiments on various real-life databases show that $\mathcal{D}$-walks outperforms NetKit [5], the approach of Zhou and Schölkopf [15] and the regularized laplacian kernel [2]. The benefit of $\mathcal{D}$-walks is particularly noticeable when few labeled nodes are available. The computation time of $\mathcal{D}$-walks is also substantially lower in all cases.