A machine learning based approach for table detection on the web
Proceedings of the 11th international conference on World Wide Web
Flexible Web Document Analysis for Delivery to Narrow-Bandwidth Devices
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Mining tables from large scale HTML texts
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Towards domain-independent information extraction from web tables
Proceedings of the 16th international conference on World Wide Web
Context-aware query suggestion by mining click-through and session data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Towards intent-driven bidterm suggestion
Proceedings of the 18th international conference on World wide web
Overview of autofeed: an unsupervised learning system for generating webfeeds
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Identifying synonyms among distributionally similar words
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Harvesting relational tables from lists on the web
Proceedings of the VLDB Endowment
Answering table queries on the web using column keywords
Proceedings of the VLDB Endowment
Scientific table type classification in digital library
Proceedings of the 2012 ACM symposium on Document engineering
Wikipedia entity expansion and attribute extraction from the web using semi-supervised learning
Proceedings of the sixth ACM international conference on Web search and data mining
Web table taxonomy and formalization
ACM SIGMOD Record
Using linked data to mine RDF from wikipedia's tables
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
We report on a census of the types of HTML tables on the Web according to a fine-grained classification taxonomy describing the semantics that they express. For each relational table type, we describe open challenges for extracting from them semantic triples, i.e., knowledge. We also present TabEx, a supervised framework for web-scale HTML table classification and apply it to the task of classifying HTML tables into our taxonomy. We show empirical evidence, through a large-scale experimental analysis over a crawl of the Web, that classification accuracy significantly outperforms several baselines. We present a detailed feature analysis and outline the most salient features for each table type.