A fine-grained taxonomy of tables on the web

  • Authors:
  • Eric Crestan;Patrick Pantel

  • Affiliations:
  • Yahoo! Labs, Sunnyvale, CA, USA;Microsoft Research, Redmond, WA, USA

  • Venue:
  • CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a classification taxonomy over a large crawl of HTML tables on the Web, focusing primarily on those tables that express structured knowledge. The taxonomy separates tables into two top-level classes: a) those used for layout purposes, including navigational and formatting; and b) those containing relational knowledge, including listings, attribute/value, matrix, enumeration, and form. We then propose a classification algorithm for automatically detecting a subset of the classes in our taxonomy, namely layout tables and attribute/value tables. We report on the performance of our system over a large sample of manually annotated HTML tables on the Web.