TINTIN: a system for retrieval in text tables
DL '97 Proceedings of the second ACM international conference on Digital libraries
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
A machine learning based approach for table detection on the web
Proceedings of the 11th international conference on World Wide Web
QuASM: a system for question answering using semi-structured data
Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries
Table extraction using conditional random fields
Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Layout & language: preliminary experiments in assigning logical structure to table cells
ANLC '97 Proceedings of the fifth conference on Applied natural language processing
Mining tables from large scale HTML texts
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Layout and language: integrating spatial and linguistic knowledge for layout understanding tasks
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Transforming arbitrary tables into logical form with TARTAR
Data & Knowledge Engineering
Towards domain-independent information extraction from web tables
Proceedings of the 16th international conference on World Wide Web
Automatic hidden-web table interpretation, conceptualization, and semantic annotation
Data & Knowledge Engineering
Web Semantics: Science, Services and Agents on the World Wide Web
Mining for attributes and values in tables
Proceedings of the International Conference on Management of Emergent Digital EcoSystems
FACTO: a fact lookup engine based on web tables
Proceedings of the 20th international conference on World wide web
Table detection from plain text using machine learning and document structure
APWeb'06 Proceedings of the 8th Asia-Pacific Web conference on Frontiers of WWW Research and Development
Web table discrimination with composition of rich structural and content information
Applied Soft Computing
Hi-index | 0.00 |
Information extraction from tables in web pages is a challenging problem due to the diverse nature of table formats and the vocabulary variants in attribute names. This paper presents a new approach to automated table extraction that exploits formatting cues in semi-structured HTML tables, learns lexical variants from training examples and uses a vector space model to deal with non-exact matches among labels. We conducted experiments with this method on a set of tables collected from 157 university web sites, and obtained the information extraction performance of 91.4% in the Fl-measure, showing the effectiveness of the combined use of structural table parsing and example-based label learning.