Web-scale knowledge extraction from semi-structured tables

Authors:
Eric Crestan;Patrick Pantel
Affiliations:
Yahoo! Labs, Sunnyvale, CA, USA;Yahoo! Labs, Sunnyvale, CA, USA
Venue:
Proceedings of the 19th international conference on World wide web
Year:
2010

Citing 4
Cited 6

A machine learning based approach for table detection on the web

Proceedings of the 11th international conference on World Wide Web
Mining tables from large scale HTML texts

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Towards domain-independent information extraction from web tables

Proceedings of the 16th international conference on World Wide Web
Context-aware query suggestion by mining click-through and session data

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining

Extraction and approximation of numerical attributes from the Web

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Semi-supervised truth discovery

Proceedings of the 20th international conference on World wide web
FACTO: a fact lookup engine based on web tables

Proceedings of the 20th international conference on World wide web
OSD-DB: a military logistics mobile database

APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Mining special features to improve the performance of e-commerce product selection and resume processing

International Journal of Computational Science and Engineering
Semantic extraction of geographic data from web tables for big data integration

Proceedings of the 7th Workshop on Geographic Information Retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

A wealth of knowledge is encoded in the form of tables on the World Wide Web. We propose a classification algorithm and a rich feature set for automatically recognizing layout tables and attribute/value tables. We report the frequencies of these table types over a large analysis of the Web and propose open challenges for extracting from attribute/value tables semantic triples (knowledge). We then describe a solution to a key problem in extracting semantic triples: protagonist detection, i.e., finding the subject of the table that often is not present in the table itself. In 79% of our Web tables, our method finds the correct protagonist in its top three returned candidates.