A relational model of data for large shared data banks
Communications of the ACM
A machine learning based approach for table detection on the web
Proceedings of the 11th international conference on World Wide Web
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
Detecting Tables in HTML Documents
DAS '02 Proceedings of the 5th International Workshop on Document Analysis Systems V
Automating the extraction of data from HTML tables with unknown structure
Data & Knowledge Engineering - Special issue: ER 2002
Towards domain-independent information extraction from web tables
Proceedings of the 16th international conference on World Wide Web
WebTables: exploring the power of tables on the web
Proceedings of the VLDB Endowment
Querying structured information sources on the web
Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services
Web-scale extraction of structured data
ACM SIGMOD Record
Proceedings of the 13th International Conference on Extending Database Technology
Labeling data extracted from the web
OTM'07 Proceedings of the 2007 OTM Confederated international conference on On the move to meaningful internet systems: CoopIS, DOA, ODBASE, GADA, and IS - Volume Part I
Annotating and searching web tables using entities, types and relationships
Proceedings of the VLDB Endowment
Web-scale table census and classification
Proceedings of the fourth ACM international conference on Web search and data mining
Recovering semantics of tables on the web
Proceedings of the VLDB Endowment
Data Extraction from Web Tables: The Devil is in the Details
ICDAR '11 Proceedings of the 2011 International Conference on Document Analysis and Recognition
Hi-index | 0.00 |
The Web is the largest repository of data available, with over 150 million high-quality tables. Several works have combined efforts to allow queries on these tables, but there are still challenges, like the various different types of structures found on the Web. In this paper, we propose a taxonomy for the tabular structures and formalize the ones used with relational data and show, through an experimental evaluation, that WTClassifier, our supervised framework, classifies Web tables with high accuracy. Additionally, we use WTClassifier to categorize more than 300 thousandWeb tables into our taxonomy and found that 82.25% are not formatted similarly to relational structure.