The effect of document retrieval quality on factoid question answering performance
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Yago: a core of semantic knowledge
Proceedings of the 16th international conference on World Wide Web
Freebase: a collaboratively created graph database for structuring human knowledge
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Introduction to Information Retrieval
Introduction to Information Retrieval
Distant supervision for relation extraction without labeled data
ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
From information to knowledge: harvesting entities and relationships from web sources
Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Entity disambiguation for knowledge base population
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Knowledge base population: successful approaches and challenges
HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies - Volume 1
Collective entity linking in web text: a graph-based method
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Selecting actions for resource-bounded information extraction using reinforcement learning
Proceedings of the fifth ACM international conference on Web search and data mining
Identifying constant and unique relations by using time-series text
EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Hi-index | 0.00 |
Over the past few years, massive amounts of world knowledge have been accumulated in publicly available knowledge bases, such as Freebase, NELL, and YAGO. Yet despite their seemingly huge size, these knowledge bases are greatly incomplete. For example, over 70% of people included in Freebase have no known place of birth, and 99% have no known ethnicity. In this paper, we propose a way to leverage existing Web-search-based question-answering technology to fill in the gaps in knowledge bases in a targeted way. In particular, for each entity attribute, we learn the best set of queries to ask, such that the answer snippets returned by the search engine are most likely to contain the correct value for that attribute. For example, if we want to find Frank Zappa's mother, we could ask the query `who is the mother of Frank Zappa'. However, this is likely to return `The Mothers of Invention', which was the name of his band. Our system learns that it should (in this case) add disambiguating terms, such as Zappa's place of birth, in order to make it more likely that the search results contain snippets mentioning his mother. Our system also learns how many different queries to ask for each attribute, since in some cases, asking too many can hurt accuracy (by introducing false positives). We discuss how to aggregate candidate answers across multiple queries, ultimately returning probabilistic predictions for possible values for each attribute. Finally, we evaluate our system and show that it is able to extract a large number of facts with high confidence.