Web-scale information extraction in knowitall: (preliminary results)

  • Authors:
  • Oren Etzioni;Michael Cafarella;Doug Downey;Stanley Kok;Ana-Maria Popescu;Tal Shaked;Stephen Soderland;Daniel S. Weld;Alexander Yates

  • Affiliations:
  • University of Washington, Seattle, WA;University of Washington, Seattle, WA;University of Washington, Seattle, WA;University of Washington, Seattle, WA;University of Washington, Seattle, WA;University of Washington, Seattle, WA;University of Washington, Seattle, WA;University of Washington, Seattle, WA;University of Washington, Seattle, WA

  • Venue:
  • Proceedings of the 13th international conference on World Wide Web
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Manually querying search engines in order to accumulate a large bodyof factual information is a tedious, error-prone process of piecemealsearch. Search engines retrieve and rank potentially relevantdocuments for human perusal, but do not extract facts, assessconfidence, or fuse information from multiple documents. This paperintroduces KnowItAll, a system that aims to automate the tedious process ofextracting large collections of facts from the web in an autonomous,domain-independent, and scalable manner.The paper describes preliminary experiments in which an instance of KnowItAll, running for four days on a single machine, was able to automatically extract 54,753 facts. KnowItAll associates a probability with each fact enabling it to trade off precision and recall. The paper analyzes KnowItAll's architecture and reports on lessons learned for the design of large-scale information extraction systems.