Factoring web tables

  • Authors:
  • David W. Embley;Mukkai Krishnamoorthy;George Nagy;Sharad Seth

  • Affiliations:
  • Brigham Young University, Provo, UT;Rensselaer Polytechnic Institute, Troy, NY;Rensselaer Polytechnic Institute, Troy, NY;University of Nebraska, Lincoln, Lincoln, NE

  • Venue:
  • IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic interpretation of web tables can enable database-like semantic search over the plethora of information stored in tables on the web. Our table interpretation method presented here converts the two-dimensional hierarchy of table headers, which provides a visual means of assimilating complex data, into a set of strings that is more amenable to algorithmic analysis of table structure. We show that Header Paths, a new purely syntactic representation of visual tables, can be readily transformed ("factored") into several existing representations of structured data, including category trees and relational tables. Detailed examination of over 100 tables reveals what table features require further work.