Bias-Free hypothesis evaluation in multirelational domains

Authors:
Christine Körner;Stefan Wrobel
Affiliations:
Fraunhofer Institut Autonome Intelligente Systeme, Germany;Fraunhofer Institut Autonome Intelligente Systeme, Germany
Venue:
PAKDD'06 Proceedings of the 10th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining
Year:
2006

Citing 3
Cited 0

Learning Probabilistic Relational Models

SARA '02 Proceedings of the 4th International Symposium on Abstraction, Reformulation, and Approximation
Autocorrelation and linkage cause bias in evaluation of relational learners

ILP'02 Proceedings of the 12th international conference on Inductive logic programming
Discriminative probabilistic models for relational data

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

In propositional domains using a separate test set via random sampling or cross validation is generally considered to be an unbiased estimator of true error. In multirelational domains previous work has already noted that linkage of objects may cause these procedures to be biased and has proposed corrected sampling procedures. However, as we show in this paper, the existing procedures only address one particular case of bias introduced by linkage. In this paper we therefore introduce generalized subgraph sampling, a sampling procedure based on bin packing, which ensures that test sets are properly chosen to match the probability of reencountering previously seen objects and which includes previous approaches as a special case. Experiments with data from the Internet Movie Database illustrate the performance of our algorithm.