A fine-grained taxonomy of tables on the web

Authors:
Eric Crestan;Patrick Pantel
Affiliations:
Yahoo! Labs, Sunnyvale, CA, USA;Microsoft Research, Redmond, WA, USA
Venue:
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Year:
2010

Citing 5
Cited 2

A machine learning based approach for table detection on the web

Proceedings of the 11th international conference on World Wide Web
Flexible Web Document Analysis for Delivery to Narrow-Bandwidth Devices

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Mining tables from large scale HTML texts

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Towards domain-independent information extraction from web tables

Proceedings of the 16th international conference on World Wide Web
Context-aware query suggestion by mining click-through and session data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

Web table discrimination with composition of rich structural and content information

Applied Soft Computing
Using linked data to mine RDF from wikipedia's tables

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a classification taxonomy over a large crawl of HTML tables on the Web, focusing primarily on those tables that express structured knowledge. The taxonomy separates tables into two top-level classes: a) those used for layout purposes, including navigational and formatting; and b) those containing relational knowledge, including listings, attribute/value, matrix, enumeration, and form. We then propose a classification algorithm for automatically detecting a subset of the classes in our taxonomy, namely layout tables and attribute/value tables. We report on the performance of our system over a large sample of manually annotated HTML tables on the Web.