DIRT @SBT@discovery of inference rules from text
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Freebase: a collaboratively created graph database for structuring human knowledge
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Open information extraction from the web
Communications of the ACM - Surviving the data deluge
Scaling textual inference to the web
EMNLP '08 Proceedings of the Conference on Empirical Methods in Natural Language Processing
Unsupervised methods for determining object and relation synonyms on the web
Journal of Artificial Intelligence Research
Distant supervision for relation extraction without labeled data
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Learning first-order Horn clauses from web text
EMNLP '10 Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing
Modeling relations and their mentions without labeled text
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Structured relation discovery using generative models
EMNLP '11 Proceedings of the Conference on Empirical Methods in Natural Language Processing
YAGO2: A spatially and temporally enhanced knowledge base from Wikipedia
Artificial Intelligence
Universal schema for entity type prediction
Proceedings of the 2013 workshop on Automated knowledge base construction
Hi-index | 0.00 |
In data integration we transform information from a source into a target schema. A general problem in this task is loss of fidelity and coverage: the source expresses more knowledge than can fit into the target schema, or knowledge that is hard to fit into any schema at all. This problem is taken to an extreme in information extraction (IE) where the source is natural language. To address this issue, one can either automatically learn a latent schema emergent in text (a brittle and ill-defined task), or manually extend schemas. We propose instead to store data in a probabilistic database of universal schema. This schema is simply the union of all source schemas, and the probabilistic database learns how to predict the cells of each source relation in this union. For example, the database could store Freebase relations and relations that correspond to natural language surface patterns. The database would learn to predict what freebase relations hold true based on what surface patterns appear, and vice versa. We describe an analogy between such databases and collaborative filtering models, and use it to implement our paradigm with probabilistic PCA, a scalable and effective collaborative filtering method.