Leveraging label-independent features for classification in sparsely labeled networks: an empirical study

Authors:
Brian Gallagher;Tina Eliassi-Rad
Affiliations:
Lawrence Livermore National Laboratory, Livermore, CA;Lawrence Livermore National Laboratory, Livermore, CA
Venue:
SNAKDD'08 Proceedings of the Second international conference on Advances in social network mining and analysis
Year:
2008

Citing 16
Cited 9

Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Random Forests

Machine Learning
Learning probabilistic models of link structure

The Journal of Machine Learning Research
Simple Estimators for Relational Bayesian Classifiers

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Aggregation-based feature invention and relational concept classes

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning relational probability trees

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Why collective inference improves relational classification

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Leveraging Relational Autocorrelation with Latent Group Models

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Pruning Social Networks Using Structural Properties and Descriptive Attributes

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Reality mining: sensing complex social systems

Personal and Ubiquitous Computing
Relational Dependency Networks

The Journal of Machine Learning Research
Classification in Networked Data: A Toolkit and a Univariate Case Study

The Journal of Machine Learning Research
An Examination of Experimental Methodology for Classifiers of Relational Data

ICDMW '07 Proceedings of the Seventh IEEE International Conference on Data Mining Workshops
Using ghost edges for classification in sparsely labeled networks

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Cautious inference in collective classification

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Discriminative probabilistic models for relational data

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

Cautious Collective Classification

The Journal of Machine Learning Research
Label-dependent feature extraction in social networks for node classification

SocInfo'10 Proceedings of the Second international conference on Social informatics
A method of label-dependent feature extraction in social networks

ICCCI'10 Proceedings of the Second international conference on Computational collective intelligence: technologies and applications - Volume Part II
It's who you know: graph mining using recursive structural features

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Network node label acquisition and tracking

EPIA'11 Proceedings of the 15th Portugese conference on Progress in artificial intelligence
Label-dependent node classification in the network

Neurocomputing
RolX: structural role extraction & mining in large graphs

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Collective Churn Prediction in Social Network

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
Active learning and inference method for within network classification

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Quantified Score

Hi-index	0.01

Visualization

Abstract

We address the problem of within-network classification in sparsely labeled networks. Recent work has demonstrated success with statistical relational learning (SRL) and semi-supervised learning (SSL) on such problems. However, both approaches rely on the availability of labeled nodes to infer the values of missing labels. When few labels are available, the performance of these approaches can degrade. In addition, many such approaches are sensitive to the specific set of nodes labeled. So, although average performance may be acceptable, the performance on a specific task may not. We explore a complimentary approach to within-network classification, based on the use of label-independent (LI) features - i.e., features calculated without using the values of class labels. While previous work has made some use of LI features, the effects of these features on classification performance have not been extensively studied. Here, we present an empirical study in order to better understand these effects. Through experiments on several real-world data sets, we show that the use of LI features produces classifiers that are less sensitive to specific label assignments and can lead to performance improvements of over 40% for both SRL- and SSL-based classifiers. We also examine the relative utility of individual LI features; and show that, in many cases, it is a combination of a few diverse network-based structural characteristics that is most informative.