Semi-supervised classification and betweenness computation on large, sparse, directed graphs

Authors:
Amin Mantrach;Nicolas van Zeebroeck;Pascal Francq;Masashi Shimbo;Hugues Bersini;Marco Saerens
Affiliations:
IRIDIA Laboratory, Université Libre de Bruxelles, 50 Av. Fr. Roosevelt, B-1050 Brussels, Belgium;IRIDIA Laboratory, Université Libre de Bruxelles, 50 Av. Fr. Roosevelt, B-1050 Brussels, Belgium;ISYS/LSM & Machine Learning Group, Universitéé de Louvain, Place des Doyens 1, B-1348 Louvain-la-Neuve, Belgium;Graduate School of Information Science, Nara Institute of Science and Technology, 8916-5 Takayama, Ikoma, Nara, Japan;IRIDIA Laboratory, Université Libre de Bruxelles, 50 Av. Fr. Roosevelt, B-1050 Brussels, Belgium;ISYS/LSM & Machine Learning Group, Universitéé de Louvain, Place des Doyens 1, B-1348 Louvain-la-Neuve, Belgium
Venue:
Pattern Recognition
Year:
2011

Citing 27
Cited 7

Statistical methods for speech recognition

Statistical methods for speech recognition
Diffusion Kernels on Graphs and Other Discrete Input Spaces

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Learning from Labeled and Unlabeled Data using Graph Mincuts

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Kernel Methods for Pattern Analysis

Kernel Methods for Pattern Analysis
Automatic multimedia cross-modal correlation discovery

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Application of kernels to link analysis

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Predictive low-rank decomposition for kernel methods

ICML '05 Proceedings of the 22nd international conference on Machine learning
Learning from labeled and unlabeled data on a directed graph

ICML '05 Proceedings of the 22nd international conference on Machine learning
An Experimental Investigation of Graph Kernels on a Collaborative Recommendation Task

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Fast Random Walk with Restart and Its Applications

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
The link-prediction problem for social networks

Journal of the American Society for Information Science and Technology
Classification in Networked Data: A Toolkit and a Univariate Case Study

The Journal of Machine Learning Research
Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation

IEEE Transactions on Knowledge and Data Engineering
Fast direction-aware proximity for graph mining

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Random walk with restart: fast solutions and applications

Knowledge and Information Systems
Fast incremental proximity search in large graphs

Proceedings of the 25th international conference on Machine learning
Graph transduction via alternating minimization

Proceedings of the 25th international conference on Machine learning
A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Semi-supervised Classification from Discriminative Random Walks

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Graph nodes clustering with the sigmoid commute-time kernel: A comparative study

Data & Knowledge Engineering
Linear Neighborhood Propagation and Its Applications

IEEE Transactions on Pattern Analysis and Machine Intelligence
Randomized shortest-path problems: Two related models

Neural Computation
Affinity measures based on the graph Laplacian

TextGraphs-3 Proceedings of the 3rd Textgraphs Workshop on Graph-Based Algorithms for Natural Language Processing
Introduction to Semi-Supervised Learning

Introduction to Semi-Supervised Learning
Graph nodes clustering based on the commute-time kernel

PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
The Sum-over-Paths Covariance Kernel: A Novel Covariance Measure between Nodes of a Directed Graph

IEEE Transactions on Pattern Analysis and Machine Intelligence
Semi-Supervised Learning

Semi-Supervised Learning

An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification

Neural Networks
Review of statistical network analysis: models, algorithms, and software

Statistical Analysis and Data Mining
A link-analysis-based discriminant analysis for exploring partially labeled graphs

Pattern Recognition Letters
Aggregation pheromone metaphor for semi-supervised classification

Pattern Recognition
A second order cone programming approach for semi-supervised learning

Pattern Recognition
Capturing the essence of word-of-mouth for social commerce: Assessing the quality of online e-commerce reviews by a semi-supervised approach

Decision Support Systems
On the characterization of noise filters for self-training semi-supervised in nearest neighbor classification

Neurocomputing

Quantified Score

Hi-index	0.01

Visualization

Abstract

This work addresses graph-based semi-supervised classification and betweenness computation in large, sparse, networks (several millions of nodes). The objective of semi-supervised classification is to assign a label to unlabeled nodes using the whole topology of the graph and the labeling at our disposal. Two approaches are developed to avoid explicit computation of pairwise proximity between the nodes of the graph, which would be impractical for graphs containing millions of nodes. The first approach directly computes, for each class, the sum of the similarities between the nodes to classify and the labeled nodes of the class, as suggested initially in [1,2]. Along this approach, two algorithms exploiting different state-of-the-art kernels on a graph are developed. The same strategy can also be used in order to compute a betweenness measure. The second approach works on a trellis structure built from biased random walks on the graph, extending an idea introduced in [3]. These random walks allow to define a biased bounded betweenness for the nodes of interest, defined separately for each class. All the proposed algorithms have a linear computing time in the number of edges while providing good results, and hence are applicable to large sparse networks. They are empirically validated on medium-size standard data sets and are shown to be competitive with state-of-the-art techniques. Finally, we processed a novel data set, which is made available for benchmarking, for multi-class classification in a large network: the U.S. patents citation network containing 3M nodes (of six different classes) and 38M edges. The three proposed algorithms achieve competitive results (around 85% classification rate) on this large network-they classify the unlabeled nodes within a few minutes on a standard workstation.