Understanding Propagation Error and Its Effect on Collective Classification

  • Authors:
  • Rongjing Xiang;Jennifer Neville

  • Affiliations:
  • -;-

  • Venue:
  • ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recent empirical evaluation has shown that the performance of collective classification models can vary based on the amount of class label information available for use during inference. In this paper, we further demonstrate that the relative performance of statistical relational models learned with different estimation methods changes as the availability of test set labels increases. We reason about the cause of this phenomenon from an information-theoretic perspective and this points to a previously unidentified consideration in the development of relational learning algorithms. In particular, we characterize the high propagation error of collective inference models that are estimated with maximum pseudolikelihood estimation (MPLE), and show how this affects performance across the spectrum of label availability when compared to MLE, which has low propagation error. Our formal study leads to a quantitative characterization that can be used to predict the confidence of local propagation for MPLE models. We use this to propose a mixture model that can learn the best trade-off between high and low propagation models. Empirical evaluation on synthetic and real-world data show that our proposed method achieves comparable, or superior, results to both MPLE and low propagation models across the full spectrum of label availability.