Visually guided bottom-up table detection and segmentation in web documents

Authors:
Bernhard Krüpl;Marcus Herzog
Affiliations:
Vienna University of Technology;Vienna University of Technology
Venue:
Proceedings of the 15th international conference on World Wide Web
Year:
2006

Citing 2
Cited 12

An Optimization Methodology for Document Structure Extraction on Latin Character Documents

IEEE Transactions on Pattern Analysis and Machine Intelligence
Using visual cues for extraction of tabular data from arbitrary HTML documents

WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web

Towards domain-independent information extraction from web tables

Proceedings of the 16th international conference on World Wide Web
Table extraction using spatial reasoning on the CSS2 visual box model

AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Enabling Interactive Access to Web Tables

Proceedings of the 13th International Conference on Human-Computer Interaction. Part I: New Trends
Automated ontology instantiation from tabular web sources-The AllRight system

Web Semantics: Science, Services and Agents on the World Wide Web
Improving accessibility through the visual structure of web contents

UAHCI'07 Proceedings of the 4th international conference on Universal access in human-computer interaction: applications and services
Automatic document structure detection for data integration

BIS'07 Proceedings of the 10th international conference on Business information systems
ALLRIGHT: automatic ontology instantiation from tabular web documents

ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Enhancing browsing experience of table and image elements in web pages

International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
Enabling efficient browsing and manipulation of web tables on smartphone

HCII'11 Proceedings of the 14th international conference on Human-computer interaction: towards mobile and intelligent interaction environments - Volume Part III
A versatile model for web page representation, information extraction and content re-packaging

Proceedings of the 11th ACM symposium on Document engineering
Using ontologies for extracting product features from web pages

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Feature-based object identification for web automation

Proceedings of the 28th Annual ACM Symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the AllRight project, we are developing an algorithm for unsupervised table detection and segmentation that uses the visual rendition of a Web page rather than the HTML code. Our algorithm works bottom-up by grouping word bounding boxes into larger groups and uses a set of heuristics. It has already been implemented and a preliminary evaluation on about 6000 Web documents has been carried out.