Organizing and searching the world wide web of facts - step one: the one-million fact extraction challenge

  • Authors:
  • Marius Pasca;Dekang Lin;Jeffrey Bigham;Andrei Lifchits;Alpa Jain

  • Affiliations:
  • Google Inc., Mountain View, CA;Google Inc., Mountain View, CA;Google Inc., Univ. of Washington, Seattle, WA;Google Inc., Univ. of British Columbia, Vancouver, BC;Google Inc., Columbia Univ., New York, NY

  • Venue:
  • AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Due to the inherent difficulty of processing noisy text, the potential of the Web as a decentralized repository of human knowledge remains largely untapped during Web search. The access to billions of binary relations among named entities would enable new search paradigms and alternative methods for presenting the search results. A first concrete step towards building large searchable repositories of factual knowledge is to derive such knowledge automatically at large scale from textual documents. Generalized contextual extraction patterns allow for fast iterative progression towards extracting one million facts of a given type (e.g., Person-BornIn-Year) from 100 million Web documents of arbitrary quality. The extraction starts from as few as 10 seed facts, requires no additional input knowledge or annotated text, and emphasizes scale and coverage by avoiding the use of syntactic parsers, named entity recognizers, gazetteers, and similar text processing tools and resources.