Empirical comparison of "hard" and "soft" label propagation for relational classification

Authors:
Aram Galstyan;Paul R. Cohen
Affiliations:
USC Information Sciences Institute, Center for Research on Unexpected Events, Marina del Rey, CA;USC Information Sciences Institute, Center for Research on Unexpected Events, Marina del Rey, CA
Venue:
ILP'07 Proceedings of the 17th international conference on Inductive logic programming
Year:
2007

Citing 6
Cited 2

Enhanced hypertext categorization using hyperlinks

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Automating the Construction of Internet Portals with Machine Learning

Information Retrieval
Learning Probabilistic Relational Models

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
A study of relevance propagation for web search

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Inferring useful heuristics from the dynamics of iterative relational classifiers

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Cautious Collective Classification

The Journal of Machine Learning Research
Efficient label propagation for classification on information networks

Proceedings of the Third Symposium on Information and Communication Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we differentiate between hard and soft label propagation for classification of relational (networked) data. The latter method assigns probabilities or class-membership scores to data instances, then propagates these scores throughout the networked data, whereas the former works by explicitly propagating class labels at each iteration. We present a comparative empirical study of these methods applied to a relational binary classification task, and evaluate two approaches on both synthetic and real-world relational data. Our results indicate that while neither approach dominates the other over the entire range of input data parameters, there are some interesting and non-trivial tradeoffs between them.