Scalable relation prediction exploiting both intrarelational correlation and contextual information

  • Authors:
  • Xueyan Jiang;Volker Tresp;Yi Huang;Maximilian Nickel;Hans-Peter Kriegel

  • Affiliations:
  • Ludwig Maximilian University of Munich, Munich, Germany;Corporate Technology, Siemens AG, Munich, Germany, Ludwig Maximilian University of Munich, Munich, Germany;Corporate Technology, Siemens AG, Munich, Germany, Ludwig Maximilian University of Munich, Munich, Germany;Ludwig Maximilian University of Munich, Munich, Germany;Ludwig Maximilian University of Munich, Munich, Germany

  • Venue:
  • ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We consider the problem of predicting instantiated binary relations in a multi-relational setting and exploit both intrarelational correlations and contextual information. For the modular combination we discuss simple heuristics, additive models and an approach that can be motivated from a hierarchical Bayesian perspective. In the concrete examples we consider models that exploit contextual information both from the database and from contextual unstructured information, e.g., information extracted from textual documents describing the involved entities. By using low-rank approximations in the context models, the models perform latent semantic analyses and can generalize across specific terms, i.e., the model might use similar latent representations for semantically related terms. All the approaches we are considering have unique solutions. They can exploit sparse matrix algebra and are thus highly scalable and can easily be generalized to new entities. We evaluate the effectiveness of nonlinear interaction terms and reduce the number of terms by applying feature selection. For the optimization of the context model we use an alternating least squares approach. We experimentally analyze scalability. We validate our approach using two synthetic data sets and using two data sets derived from the Linked Open Data (LOD) cloud.