Lightweight web-based fact repositories for textual question answering

  • Authors:
  • Marius Paşca

  • Affiliations:
  • Google Inc., Mountain View, CA

  • Venue:
  • Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Since answers to fact-seeking questions usually reside within small factual text nuggets, often "hidden" within full-length documents, their relevance to a question is not necessarily correlated to the relevance of the full-length document to the question. Yet previous approaches to open-domain textual question answering from large document collections quasi-unanimously employ a document retrieval stage, in order to apply widely different, often expensive answer mining techniques to only a small subset of documents. Depending on the collection size, 95% or more of the documents in the collection (much more in the case of the Web) are left out of the selected subset for any given query, and thus become invisible to subsequent processing stages for actual answer mining. This paper introduces a new model for answer retrieval for question answering. The collection is distilled offline into large repositories of facts. Each fact constitutes a potential direct answer to questions seeking a particular kind of entity or relation, such as questions asking about the date of particular events. Question answering becomes equivalent to online fact retrieval, which greatly simplifies the de-facto system architecture for fact-seeking question answering. In addition to simplicity, experiments on a fact repository acquired from approximately a billion Web documents illustrate the impact of fact repositories in extracting accurate answers to a standard evaluation set of open-domain test questions and additional sets of domain-specific questions.