Semi-supervised Classification from Discriminative Random Walks

Authors:
Jérôme Callut;Kevin Françoisse;Marco Saerens;Pierre Dupont
Affiliations:
UCL Machine Learning Group (MLG), and Louvain School of Management, ISYS unit, Université catholique de Louvain, Louvain-la-Neuve, Belgium B-1348;UCL Machine Learning Group (MLG), and Louvain School of Management, ISYS unit, Université catholique de Louvain, Louvain-la-Neuve, Belgium B-1348;UCL Machine Learning Group (MLG), and Louvain School of Management, ISYS unit, Université catholique de Louvain, Louvain-la-Neuve, Belgium B-1348;UCL Machine Learning Group (MLG), and Department of Computing Science and Engineering, INGI, Université catholique de Louvain, Louvain-la-Neuve, Belgium B-1348
Venue:
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Year:
2008

Citing 5
Cited 11

Fundamentals of speech recognition

Fundamentals of speech recognition
Learning kernels from biological networks by maximizing entropy

Bioinformatics
Learning from labeled and unlabeled data on a directed graph

ICML '05 Proceedings of the 22nd international conference on Machine learning
Classification in Networked Data: A Toolkit and a Univariate Case Study

The Journal of Machine Learning Research
Efficient and simple generation of random simple connected graphs with prescribed degree sequence

COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics

Within-Network Classification Using Local Structure Similarity

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Semi-supervised classification and betweenness computation on large, sparse, directed graphs

Pattern Recognition
Using the mutual k-nearest neighbor graphs for semi-supervised classification of natural language data

CoNLL '11 Proceedings of the Fifteenth Conference on Computational Natural Language Learning
A nonparametric classification method based on K-associated graphs

Information Sciences: an International Journal
An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification

Neural Networks
A hybrid text classification approach with low dependency on parameter by integrating K-nearest neighbor and support vector machine

Expert Systems with Applications: An International Journal
An enhanced Support Vector Machine classification framework by using Euclidean distance function for text document categorization

Applied Intelligence
Review of statistical network analysis: models, algorithms, and software

Statistical Analysis and Data Mining
A link-analysis-based discriminant analysis for exploring partially labeled graphs

Pattern Recognition Letters
2013 Special Issue: Detecting and preventing error propagation via competitive learning

Neural Networks
GA-TVRC-Het: genetic algorithm enhanced time varying relational classifier for evolving heterogeneous networks

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes a novel technique, called $\mathcal{D}$-walks, to tackle semi-supervised classification problems in large graphs. We introduce here a betweenness measure based on passage times during random walks of bounded lengths. Such walks are further constrained to start and end in nodes within the same class, defining a distinct betweenness for each class. Unlabeled nodes are classified according to the class showing the highest betweenness. Forward and backward recurrences are derived to efficiently compute the passage times. $\mathcal{D}$-walks can deal with directed or undirected graphs with a linear time complexity with respect to the number of edges, the maximum walk length considered and the number of classes. Experiments on various real-life databases show that $\mathcal{D}$-walks outperforms NetKit [5], the approach of Zhou and Schölkopf [15] and the regularized laplacian kernel [2]. The benefit of $\mathcal{D}$-walks is particularly noticeable when few labeled nodes are available. The computation time of $\mathcal{D}$-walks is also substantially lower in all cases.