An Optimization Methodology for Document Structure Extraction on Latin Character Documents
IEEE Transactions on Pattern Analysis and Machine Intelligence
Using visual cues for extraction of tabular data from arbitrary HTML documents
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Towards domain-independent information extraction from web tables
Proceedings of the 16th international conference on World Wide Web
Table extraction using spatial reasoning on the CSS2 visual box model
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Enabling Interactive Access to Web Tables
Proceedings of the 13th International Conference on Human-Computer Interaction. Part I: New Trends
Automated ontology instantiation from tabular web sources-The AllRight system
Web Semantics: Science, Services and Agents on the World Wide Web
Improving accessibility through the visual structure of web contents
UAHCI'07 Proceedings of the 4th international conference on Universal access in human-computer interaction: applications and services
Automatic document structure detection for data integration
BIS'07 Proceedings of the 10th international conference on Business information systems
ALLRIGHT: automatic ontology instantiation from tabular web documents
ISWC'07/ASWC'07 Proceedings of the 6th international The semantic web and 2nd Asian conference on Asian semantic web conference
Enhancing browsing experience of table and image elements in web pages
International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
Enabling efficient browsing and manipulation of web tables on smartphone
HCII'11 Proceedings of the 14th international conference on Human-computer interaction: towards mobile and intelligent interaction environments - Volume Part III
A versatile model for web page representation, information extraction and content re-packaging
Proceedings of the 11th ACM symposium on Document engineering
Using ontologies for extracting product features from web pages
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Feature-based object identification for web automation
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Hi-index | 0.00 |
In the AllRight project, we are developing an algorithm for unsupervised table detection and segmentation that uses the visual rendition of a Web page rather than the HTML code. Our algorithm works bottom-up by grouping word bounding boxes into larger groups and uses a set of heuristics. It has already been implemented and a preliminary evaluation on about 6000 Web documents has been carried out.