Snowball: extracting relations from large plain-text collections
DL '00 Proceedings of the fifth ACM conference on Digital libraries
A flexible learning system for wrapping tables and lists in HTML documents
Proceedings of the 11th international conference on World Wide Web
A machine learning based approach for table detection on the web
Proceedings of the 11th international conference on World Wide Web
Foundations of Databases: The Logical Level
Foundations of Databases: The Logical Level
A framework for web table mining
Proceedings of the 4th international workshop on Web information and data management
Visual Based Content Understanding towards Web Adaptation
AH '02 Proceedings of the Second International Conference on Adaptive Hypermedia and Adaptive Web-Based Systems
ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Flexible Web Document Analysis for Delivery to Narrow-Bandwidth Devices
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Why Table Ground-Truthing is Hard
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
HTML Page Analysis Based on Visual Cues
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Tabular abstraction, editing, and formatting
Tabular abstraction, editing, and formatting
Mining tables from large scale HTML texts
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Using the structure of Web sites for automatic segmentation of tables
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Proceedings of the 13th international World Wide Web conference on Alternate track papers & posters
A survey of table recognition: Models, observations, transformations, and inferences
International Journal on Document Analysis and Recognition
Editorial: special issue on web content mining
ACM SIGKDD Explorations Newsletter
Fully automatic wrapper generation for search engines
WWW '05 Proceedings of the 14th international conference on World Wide Web
Web data extraction based on partial tree alignment
WWW '05 Proceedings of the 14th international conference on World Wide Web
Using visual cues for extraction of tabular data from arbitrary HTML documents
WWW '05 Special interest tracks and posters of the 14th international conference on World Wide Web
Answering queries from statistics and probabilistic views
VLDB '05 Proceedings of the 31st international conference on Very large data bases
ViPER: augmenting automatic information extraction with visual perceptions
Proceedings of the 14th ACM international conference on Information and knowledge management
Towards Ontology Generation from Tables
World Wide Web
Adaptive Product Normalization: Using Online Learning for Record Linkage in Comparison Shopping
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Visually guided bottom-up table detection and segmentation in web documents
Proceedings of the 15th international conference on World Wide Web
Knowledge and Information Systems
Learning table extraction from examples
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
HLT-NAACL '06 Proceedings of the main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics
Methods for domain-independent information extraction from the web: an experimental comparison
AAAI'04 Proceedings of the 19th national conference on Artifical intelligence
Table extraction using spatial reasoning on the CSS2 visual box model
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Open information extraction from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Web Semantics: Science, Services and Agents on the World Wide Web
Extracting content structure for web pages based on visual representation
APWeb'03 Proceedings of the 5th Asia-Pacific web conference on Web technologies and applications
Notes on contemporary table recognition
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Ontology-driven, unsupervised instance population
Web Semantics: Science, Services and Agents on the World Wide Web
Hunting for headings: sighted labeling vs. automatic classification of headings
Proceedings of the 10th international ACM SIGACCESS conference on Computers and accessibility
Dynamic Hierarchical Markov Random Fields for Integrated Web Data Extraction
The Journal of Machine Learning Research
WebTables: exploring the power of tables on the web
Proceedings of the VLDB Endowment
Foundations and Trends in Databases
Using Wikipedia to bootstrap open information extraction
ACM SIGMOD Record
Automatic hidden-web table interpretation, conceptualization, and semantic annotation
Data & Knowledge Engineering
From Tessellations to Table Interpretation
Calculemus '09/MKM '09 Proceedings of the 16th Symposium, 8th International Conference. Held as Part of CICM '09 on Intelligent Computer Mathematics
Enabling Interactive Access to Web Tables
Proceedings of the 13th International Conference on Human-Computer Interaction. Part I: New Trends
Automated ontology instantiation from tabular web sources-The AllRight system
Web Semantics: Science, Services and Agents on the World Wide Web
Scalable web data extraction for online market intelligence
Proceedings of the VLDB Endowment
Visual extraction of information from web pages
Journal of Visual Languages and Computing
Visual structure-based web page clustering and retrieval
Proceedings of the 19th international conference on World wide web
Web-scale knowledge extraction from semi-structured tables
Proceedings of the 19th international conference on World wide web
A unified ontology-based web page model for improving accessibility
Proceedings of the 19th international conference on World wide web
Automatic hidden-web table interpretation by sibling page comparison
ER'07 Proceedings of the 26th international conference on Conceptual modeling
Web data extraction system based on label library
FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 7
Information extraction from web tables
Proceedings of the 11th International Conference on Information Integration and Web-based Applications & Services
Analysis and taxonomy of column header categories for web tables
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Method combination for information extraction
Proceedings of the 11th International Conference on Computer Systems and Technologies and Workshop for PhD Students in Computing on International Conference on Computer Systems and Technologies
A fine-grained taxonomy of tables on the web
CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Enhancing browsing experience of table and image elements in web pages
International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
Communications of the ACM
SXPath: extending XPath towards spatial querying on web documents
Proceedings of the VLDB Endowment
KBB: a knowledge-bundle builder for research studies
ER'10 Proceedings of the 2010 international conference on Advances in conceptual modeling: applications and challenges
Web-scale table census and classification
Proceedings of the fourth ACM international conference on Web search and data mining
Link-based hidden attribute discovery for objects on Web
Proceedings of the 14th International Conference on Extending Database Technology
HyLiEn: a hybrid approach to general list extraction on the web
Proceedings of the 20th international conference companion on World wide web
FACTO: a fact lookup engine based on web tables
Proceedings of the 20th international conference on World wide web
Unexpected results in automatic list extraction on the web
ACM SIGKDD Explorations Newsletter
Building Mashups by Demonstration
ACM Transactions on the Web (TWEB)
OSD-DB: a military logistics mobile database
APWeb'11 Proceedings of the 13th Asia-Pacific web conference on Web technologies and applications
Extracting general lists from web documents: a hybrid approach
IEA/AIE'11 Proceedings of the 24th international conference on Industrial engineering and other applications of applied intelligent systems conference on Modern approaches in applied intelligence - Volume Part I
Enabling efficient browsing and manipulation of web tables on smartphone
HCII'11 Proceedings of the 14th international conference on Human-computer interaction: towards mobile and intelligent interaction environments - Volume Part III
A versatile model for web page representation, information extraction and content re-packaging
Proceedings of the 11th ACM symposium on Document engineering
An indent shape based approach for web lists mining
WISM'11 Proceedings of the 2011 international conference on Web information systems and mining - Volume Part II
Towards a unified solution: data record region detection and segmentation
Proceedings of the 20th ACM international conference on Information and knowledge management
Extracting data records from query result pages based on visual features
BNCOD'11 Proceedings of the 28th British national conference on Advances in databases
WebSets: extracting sets of entities from the web using unsupervised information extraction
Proceedings of the fifth ACM international conference on Web search and data mining
Chapter 6: web data extraction for service creation
Search Computing
Datalog-Related aspects in lixto visual developer
Datalog'10 Proceedings of the First international conference on Datalog Reloaded
A system for extracting top-K lists from the web
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic transformation of multi-dimensional web tables into data cubes
DaWaK'12 Proceedings of the 14th international conference on Data Warehousing and Knowledge Discovery
A general theory of spatial relations to support a graphical tool for visual information extraction
Journal of Visual Languages and Computing
Feature-based object identification for web automation
Proceedings of the 28th Annual ACM Symposium on Applied Computing
Methods for exploring and mining tables on Wikipedia
Proceedings of the ACM SIGKDD Workshop on Interactive Data Exploration and Analytics
Robust detection of semi-structured web records using a DOM structure-knowledge-driven model
ACM Transactions on the Web (TWEB)
Using natural language to integrate, evaluate, and optimize extracted knowledge bases
Proceedings of the 2013 workshop on Automated knowledge base construction
Towards generic framework for tabular data extraction and management in documents
Proceedings of the sixth workshop on Ph.D. students in information and knowledge management
The parallel path framework for entity discovery on the web
ACM Transactions on the Web (TWEB)
Schema extraction for tabular data on the web
Proceedings of the VLDB Endowment
Web table taxonomy and formalization
ACM SIGMOD Record
Leveraging spatial join for robust tuple extraction from web pages
Information Sciences: an International Journal
Hi-index | 0.02 |
Traditionally, information extraction from web tables has focused on small, more or less homogeneous corpora, often based on assumptions about the use of tags. A multitude of different HTML implementations of web tables make these approaches difficult to scale. In this paper, we approach the problem of domain-independent information extraction from web tables by shifting our attention from the tree-based representation of webpages to a variation of the two-dimensional visual box model used by web browsers to display the information on the screen. The there by obtained topological and style information allows us to fill the gap created by missing domain-specific knowledge about content and table templates. We believe that, in a future step, this approach can become the basis for a new way of large-scale knowledge acquisition from the current "Visual Web.