Probabilistic databases of universal schema

Authors:
Limin Yao;Sebastian Riedel;Andrew McCallum
Affiliations:
University of Massachusetts, Amherst;University of Massachusetts, Amherst;University of Massachusetts, Amherst
Venue:
AKBC-WEKEX '12 Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction
Year:
2012

Citing 10
Cited 1

DIRT @SBT@discovery of inference rules from text

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Freebase: a collaboratively created graph database for structuring human knowledge

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Open information extraction from the web

Communications of the ACM - Surviving the data deluge
Scaling textual inference to the web

EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised methods for determining object and relation synonyms on the web

Journal of Artificial Intelligence Research
Distant supervision for relation extraction without labeled data

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Learning first-order Horn clauses from web text

EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Modeling relations and their mentions without labeled text

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Structured relation discovery using generative models

EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia

Artificial Intelligence

Universal schema for entity type prediction

Proceedings of the 2013 workshop on Automated knowledge base construction

Quantified Score

Hi-index	0.00

Visualization

Abstract

In data integration we transform information from a source into a target schema. A general problem in this task is loss of fidelity and coverage: the source expresses more knowledge than can fit into the target schema, or knowledge that is hard to fit into any schema at all. This problem is taken to an extreme in information extraction (IE) where the source is natural language. To address this issue, one can either automatically learn a latent schema emergent in text (a brittle and ill-defined task), or manually extend schemas. We propose instead to store data in a probabilistic database of universal schema. This schema is simply the union of all source schemas, and the probabilistic database learns how to predict the cells of each source relation in this union. For example, the database could store Freebase relations and relations that correspond to natural language surface patterns. The database would learn to predict what freebase relations hold true based on what surface patterns appear, and vice versa. We describe an analogy between such databases and collaborative filtering models, and use it to implement our paradigm with probabilistic PCA, a scalable and effective collaborative filtering method.