Identifying interesting assertions from the web

  • Authors:
  • Thomas Lin;Oren Etzioni;James Fogarty

  • Affiliations:
  • University of Washington, Seattle, WA, USA;University of Washington, Seattle, WA, USA;University of Washington, Seattle, WA, USA

  • Venue:
  • Proceedings of the 18th ACM conference on Information and knowledge management
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

How can we cull the facts we need from the overwhelming mass of information and misinformation that is the Web? The TextRunner extraction engine represents one approach, in which people pose keyword queries or simple questions and TextRunner returns concise answers based on tuples extracted from Web text. Unfortunately, the results returned by engines such as TextRunner include both informative facts (e.g., the FDA banned ephedra) and less useful statements (e.g., the FDA banned products). This paper therefore investigates filtering TextRunner results to enable people to better focus on interesting assertions. We first develop three distinct models of what assertions are likely to be interesting in response to a query. We then fully operationalize each of these models as a filter over TextRunner results. Finally, we develop a more sophisticated filter that combines the different models using relevance feedback. In a study of human ratings of the interestingness of TextRunner assertions, we show that our approach substantially enhances the quality of TextRunner results. Our best filter raises the fraction of interesting results in the top thirty from 41.6% to 64.1%.