Maintaining knowledge about temporal intervals
Communications of the ACM
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
A machine learning based approach for table detection on the web
Proceedings of the 11th international conference on World Wide Web
Extracting Patterns and Relations from the World Wide Web
WebDB '98 Selected papers from the International Workshop on The World Wide Web and Databases
Flexible Web Document Analysis for Delivery to Narrow-Bandwidth Devices
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Fully automatic wrapper generation for search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
Using visual cues for extraction of tabular data from arbitrary HTML documents
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
ViPER: augmenting automatic information extraction with visual perceptions
Proceedings of the 14th ACM international conference on Information and knowledge management
Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Visually guided bottom-up table detection and segmentation in web documents
Proceedings of the 15th international conference on World Wide Web
Estimating required recall for successful knowledge acquisition from the web
Proceedings of the 15th international conference on World Wide Web
Thesis: automatic ontology generation from web tabular structures
AI Communications
Methods for domain-independent information extraction from the web: an experimental comparison
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Extracting content structure for web pages based on visual representation
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
Towards domain-independent information extraction from web tables
Proceedings of the 16th international conference on World Wide Web
Automatic hidden-web table interpretation, conceptualization, and semantic annotation
Data & Knowledge Engineering
Converting PDF to HTML approach based on text detection
Proceedings of the 2nd International Conference on Interaction Sciences: Information Technology, Culture and Human
Visual extraction of information from web pages
Journal of Visual Languages and Computing
Automatic document structure detection for data integration
BIS'07 Proceedings of the 10th international conference on Business information systems
Visual structure-based web page clustering and retrieval
Proceedings of the 19th international conference on World wide web
Automatic hidden-web table interpretation by sibling page comparison
ER'07 Proceedings of the 26th international conference on Conceptual modeling
Extracting data records from query result pages based on visual features
BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
Using ontologies for extracting product features from web pages
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
A general theory of spatial relations to support a graphical tool for visual information extraction
Journal of Visual Languages and Computing
Spatial reasoning with rectangular cardinal relations
Annals of Mathematics and Artificial Intelligence
Synthesizing union tables from the web
IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence
Hi-index | 0.00 |
Tables on web pages contain a huge amount of semantically explicit information, which makes them a worthwhile target for automatic information extraction and knowledge acquisition from the Web. However, the task of table extraction from web pages is difficult, because of HTML's design purpose to convey visual instead of semantic information. In this paper, we propose a robust technique for table extraction from arbitrary web pages. This technique relies upon the positional information of visualized DOM element nodes in a browser and, hereby, separates the intricacies of code implementation from the actual intended visual appearance. The novel aspect of the proposed web table extraction technique is the effective use of spatial reasoning on the CSS2 visual box model, which shows a high level of robustness even without any form of learning (F-measure ≈ 90%). We describe the ideas behind our approach, the tabular pattern recognition algorithm operating on a double topographical grid structure and allowing for effective and robust extraction, and general observations on web tables that should be borne in mind by any automatic web table extraction mechanism.