Web table taxonomy and formalization

  • Authors:
  • Larissa R. Lautert;Marcelo M. Scheidt;Carina F. Dorneles

  • Affiliations:
  • Universidade Federal de Santa Catarina, Florianópolis, SC, Brazil;Universidade Federal de Santa Catarina, Florianópolis, SC, Brazil;Universidade Federal de Santa Catarina, Florianópolis, SC, Brazil

  • Venue:
  • ACM SIGMOD Record
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Web is the largest repository of data available, with over 150 million high-quality tables. Several works have combined efforts to allow queries on these tables, but there are still challenges, like the various different types of structures found on the Web. In this paper, we propose a taxonomy for the tabular structures and formalize the ones used with relational data and show, through an experimental evaluation, that WTClassifier, our supervised framework, classifies Web tables with high accuracy. Additionally, we use WTClassifier to categorize more than 300 thousandWeb tables into our taxonomy and found that 82.25% are not formatted similarly to relational structure.