Web-scale knowledge extraction from semi-structured tables

  • Authors:
  • Eric Crestan;Patrick Pantel

  • Affiliations:
  • Yahoo! Labs, Sunnyvale, CA, USA;Yahoo! Labs, Sunnyvale, CA, USA

  • Venue:
  • Proceedings of the 19th international conference on World wide web
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A wealth of knowledge is encoded in the form of tables on the World Wide Web. We propose a classification algorithm and a rich feature set for automatically recognizing layout tables and attribute/value tables. We report the frequencies of these table types over a large analysis of the Web and propose open challenges for extracting from attribute/value tables semantic triples (knowledge). We then describe a solution to a key problem in extracting semantic triples: protagonist detection, i.e., finding the subject of the table that often is not present in the table itself. In 79% of our Web tables, our method finds the correct protagonist in its top three returned candidates.