A machine learning based approach for table detection on the web
Proceedings of the 11th international conference on World Wide Web
Mining tables from large scale HTML texts
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Towards domain-independent information extraction from web tables
Proceedings of the 16th international conference on World Wide Web
Context-aware query suggestion by mining click-through and session data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Extraction and approximation of numerical attributes from the Web
ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Semi-supervised truth discovery
Proceedings of the 20th international conference on World wide web
FACTO: a fact lookup engine based on web tables
Proceedings of the 20th international conference on World wide web
OSD-DB: a military logistics mobile database
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
International Journal of Computational Science and Engineering
Semantic extraction of geographic data from web tables for big data integration
Proceedings of the 7th Workshop on Geographic Information Retrieval
Hi-index | 0.00 |
A wealth of knowledge is encoded in the form of tables on the World Wide Web. We propose a classification algorithm and a rich feature set for automatically recognizing layout tables and attribute/value tables. We report the frequencies of these table types over a large analysis of the Web and propose open challenges for extracting from attribute/value tables semantic triples (knowledge). We then describe a solution to a key problem in extracting semantic triples: protagonist detection, i.e., finding the subject of the table that often is not present in the table itself. In 79% of our Web tables, our method finds the correct protagonist in its top three returned candidates.