A machine learning based approach for table detection on the web
Proceedings of the 11th international conference on World Wide Web
Flexible Web Document Analysis for Delivery to Narrow-Bandwidth Devices
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Mining tables from large scale HTML texts
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Towards domain-independent information extraction from web tables
Proceedings of the 16th international conference on World Wide Web
Context-aware query suggestion by mining click-through and session data
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Web table discrimination with composition of rich structural and content information
Applied Soft Computing
Using linked data to mine RDF from wikipedia's tables
Proceedings of the 7th ACM international conference on Web search and data mining
Hi-index | 0.00 |
We propose a classification taxonomy over a large crawl of HTML tables on the Web, focusing primarily on those tables that express structured knowledge. The taxonomy separates tables into two top-level classes: a) those used for layout purposes, including navigational and formatting; and b) those containing relational knowledge, including listings, attribute/value, matrix, enumeration, and form. We then propose a classification algorithm for automatically detecting a subset of the classes in our taxonomy, namely layout tables and attribute/value tables. We report on the performance of our system over a large sample of manually annotated HTML tables on the Web.