Factorizing YAGO: scalable machine learning for linked data

Authors:
Maximilian Nickel;Volker Tresp;Hans-Peter Kriegel
Affiliations:
Ludwig-Maximilians-University Munich, Munich, Germany;Siemens AG, Munich, Germany;Ludwig-Maximilians-University Munich, Munich, Germany
Venue:
Proceedings of the 21st international conference on World Wide Web
Year:
2012

Citing 23
Cited 5

OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Introduction to Data Mining, (First Edition)

Introduction to Data Mining, (First Edition)
Markov logic networks

Machine Learning
The relationship between Precision-Recall and ROC curves

ICML '06 Proceedings of the 23rd international conference on Machine learning
Beyond streams and graphs: dynamic tensor analysis

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Statistical predicate invention

Proceedings of the 24th international conference on Machine learning
DL-FOIL Concept Learning in Description Logics

ILP '08 Proceedings of the 18th international conference on Inductive Logic Programming
Temporal Analysis of Semantic Graphs Using ASALSAN

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Non-parametric Statistical Learning Methods for Inductive Classifiers in Semantic Knowledge Bases

ICSC '08 Proceedings of the 2008 IEEE International Conference on Semantic Computing
Feature hashing for large scale multitask learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Tensor Decompositions and Applications

SIAM Review
TripleRank: Ranking Semantic Web Data by Tensor Decomposition

ISWC '09 Proceedings of the 8th International Semantic Web Conference
Factorizing personalized Markov chains for next-basket recommendation

Proceedings of the 19th international conference on World wide web
Kernel methods for mining instance data in ontologies

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
DBpedia: a nucleus for a web of open data

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Adding data mining support to SPARQL via statistical relational learning methods

ESWC'08 Proceedings of the 5th European semantic web conference on The semantic web: research and applications
When owl: sameAs isn't the same: an analysis of identity in linked data

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
Relational kernel machines for learning from graph-structured RDF data

ESWC'11 Proceedings of the 8th extended semantic web conference on The semantic web: research and applications - Volume Part I
Statistical schema induction

ESWC'11 Proceedings of the 8th extended semantic web conference on The semantic web: research and applications - Volume Part I
A reasonable Semantic Web

Semantic Web
Creating knowledge out of interlinked data

Semantic Web
Learning relational bayesian classifiers from RDF data

ISWC'11 Proceedings of the 10th international conference on The semantic web - Volume Part I

Scalable relation prediction exploiting both intrarelational correlation and contextual information

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Knowledge harvesting in the big-data era

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Discovering facts with boolean tensor tucker decomposition

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Improving entity search over linked data by modeling latent semantics

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A semantic matching energy function for learning with multi-relational data

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Vast amounts of structured information have been published in the Semantic Web's Linked Open Data (LOD) cloud and their size is still growing rapidly. Yet, access to this information via reasoning and querying is sometimes difficult, due to LOD's size, partial data inconsistencies and inherent noisiness. Machine Learning offers an alternative approach to exploiting LOD's data with the advantages that Machine Learning algorithms are typically robust to both noise and data inconsistencies and are able to efficiently utilize non-deterministic dependencies in the data. From a Machine Learning point of view, LOD is challenging due to its relational nature and its scale. Here, we present an efficient approach to relational learning on LOD data, based on the factorization of a sparse tensor that scales to data consisting of millions of entities, hundreds of relations and billions of known facts. Furthermore, we show how ontological knowledge can be incorporated in the factorization to improve learning results and how computation can be distributed across multiple nodes. We demonstrate that our approach is able to factorize the YAGO~2 core ontology and globally predict statements for this large knowledge base using a single dual-core desktop computer. Furthermore, we show experimentally that our approach achieves good results in several relational learning tasks that are relevant to Linked Data. Once a factorization has been computed, our model is able to predict efficiently, and without any additional training, the likelihood of any of the 4.3 ⋅ 1014 possible triples in the YAGO~2 core ontology.