Knowledge base completion via search-based question answering

Authors:
Robert West;Evgeniy Gabrilovich;Kevin Murphy;Shaohua Sun;Rahul Gupta;Dekang Lin
Affiliations:
Stanford University, Stanford, CA, USA;Google, Inc., Mountain View, CA, USA;Google, Inc., Mountain View, CA, USA;Google, Inc., Mountain View, CA, USA;Google, Inc., Mountain View, CA, USA;Google, Inc., Mountain View, CA, USA
Venue:
Proceedings of the 23rd international conference on World wide web
Year:
2014

Citing 11
Cited 0

The effect of document retrieval quality on factoid question answering performance

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Yago: a core of semantic knowledge

Proceedings of the 16th international conference on World Wide Web
Freebase: a collaboratively created graph database for structuring human knowledge

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Introduction to Information Retrieval

Introduction to Information Retrieval
Distant supervision for relation extraction without labeled data

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
From information to knowledge: harvesting entities and relationships from web sources

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Entity disambiguation for knowledge base population

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Knowledge base population: successful approaches and challenges

HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Collective entity linking in web text: a graph-based method

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Selecting actions for resource-bounded information extraction using reinforcement learning

Proceedings of the fifth ACM international conference on Web search and data mining
Identifying constant and unique relations by using time-series text

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Over the past few years, massive amounts of world knowledge have been accumulated in publicly available knowledge bases, such as Freebase, NELL, and YAGO. Yet despite their seemingly huge size, these knowledge bases are greatly incomplete. For example, over 70% of people included in Freebase have no known place of birth, and 99% have no known ethnicity. In this paper, we propose a way to leverage existing Web-search-based question-answering technology to fill in the gaps in knowledge bases in a targeted way. In particular, for each entity attribute, we learn the best set of queries to ask, such that the answer snippets returned by the search engine are most likely to contain the correct value for that attribute. For example, if we want to find Frank Zappa's mother, we could ask the query `who is the mother of Frank Zappa'. However, this is likely to return `The Mothers of Invention', which was the name of his band. Our system learns that it should (in this case) add disambiguating terms, such as Zappa's place of birth, in order to make it more likely that the search results contain snippets mentioning his mother. Our system also learns how many different queries to ask for each attribute, since in some cases, asking too many can hurt accuracy (by introducing false positives). We discuss how to aggregate candidate answers across multiple queries, ultimately returning probabilistic predictions for possible values for each attribute. Finally, we evaluate our system and show that it is able to extract a large number of facts with high confidence.