Web-scale table census and classification

Authors:
Eric Crestan;Patrick Pantel
Affiliations:
Yahoo! Labs, Sunnyvale, CA, USA;Microsoft Research, Redmond, WA, USA
Venue:
Proceedings of the fourth ACM international conference on Web search and data mining
Year:
2011

Citing 9
Cited 5

A machine learning based approach for table detection on the web

Proceedings of the 11th international conference on World Wide Web
Flexible Web Document Analysis for Delivery to Narrow-Bandwidth Devices

ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Mining tables from large scale HTML texts

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Towards domain-independent information extraction from web tables

Proceedings of the 16th international conference on World Wide Web
Context-aware query suggestion by mining click-through and session data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Towards intent-driven bidterm suggestion

Proceedings of the 18th international conference on World wide web
Overview of autofeed: an unsupervised learning system for generating webfeeds

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Identifying synonyms among distributionally similar words

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Harvesting relational tables from lists on the web

Proceedings of the VLDB Endowment

Answering table queries on the web using column keywords

Proceedings of the VLDB Endowment
Scientific table type classification in digital library

Proceedings of the 2012 ACM symposium on Document engineering
Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning

Proceedings of the sixth ACM international conference on Web search and data mining
Web table taxonomy and formalization

ACM SIGMOD Record
Using linked data to mine RDF from wikipedia's tables

Proceedings of the 7th ACM international conference on Web search and data mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

We report on a census of the types of HTML tables on the Web according to a fine-grained classification taxonomy describing the semantics that they express. For each relational table type, we describe open challenges for extracting from them semantic triples, i.e., knowledge. We also present TabEx, a supervised framework for web-scale HTML table classification and apply it to the task of classifying HTML tables into our taxonomy. We show empirical evidence, through a large-scale experimental analysis over a crawl of the Web, that classification accuracy significantly outperforms several baselines. We present a detailed feature analysis and outline the most salient features for each table type.