Record-boundary discovery in Web documents
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Learning dictionaries for information extraction by multi-level bootstrapping
AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
An automated approach for retrieving hierarchical data from HTML tables
Proceedings of the eighth international conference on Information and knowledge management
Conceptual-model-based data extraction from multiple-record Web pages
Data & Knowledge Engineering
Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Database System Concepts
Information Integration Using Logical Views
ICDT '97 Proceedings of the 6th International Conference on Database Theory
Schema Mapping as Query Discovery
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Proceedings of the 27th International Conference on Very Large Data Bases
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
RoadRunner: Towards Automatic Data Extraction from Large Web Sites
Proceedings of the 27th International Conference on Very Large Data Bases
Querying Heterogeneous Information Sources Using Source Descriptions
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Record Location and Reconfiguration in Unstructured Multiple-Record Web Documents
Selected papers from the Third International Workshop WebDB 2000 on The World Wide Web and Databases
On the Automatic Extraction of Data from the Hidden Web
Revised Papers from the HUMACS, DASWIS, ECOMO, and DAMA on ER 2001 Workshops
Recognizing Ontology-Applicable Multiple-Record Web Documents
ER '01 Proceedings of the 20th International Conference on Conceptual Modeling: Conceptual Modeling
Why Table Ground-Truthing is Hard
ICDAR '01 Proceedings of the Sixth International Conference on Document Analysis and Recognition
Toward semantic understanding: an approach based on information extraction ontologies
ADC '04 Proceedings of the 15th Australasian database conference - Volume 27
Towards Ontology Generation from Tables
World Wide Web
ACM Transactions on Database Systems (TODS)
Automatic hidden-web table interpretation, conceptualization, and semantic annotation
Data & Knowledge Engineering
BIS'07 Proceedings of the 10th international conference on Business information systems
Development of automatic web accessibility checking modules for advanced quality assurance tools
UAHCI'07 Proceedings of the 4th international conference on Universal access in human computer interaction: coping with diversity
Automatic hidden-web table interpretation by sibling page comparison
ER'07 Proceedings of the 26th international conference on Conceptual modeling
Analysis and taxonomy of column header categories for web tables
DAS '10 Proceedings of the 9th IAPR International Workshop on Document Analysis Systems
Mining for attributes and values in tables
Proceedings of the International Conference on Management of Emergent Digital EcoSystems
DART: a data acquisition and repairing tool
EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
Notes on contemporary table recognition
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Web table taxonomy and formalization
ACM SIGMOD Record
Hi-index | 0.00 |
Data on the Web in HTML tables is mostly structured, but we usually do not know the structure in advance. Thus, we cannot directly query for data of interest. We propose a solution to this problem based on document-independent extraction ontologies. Our solution entails elements of table understanding, data integration, and wrapper creation. Table understanding allows us to find tables of interest within a Web page, recognize attributes and values within the table, pair attributes with values, and form records. Data-integration techniques allow us to match source records with a target schema. Ontologically specified wrappers allow us to extract data from source records into a target schema. Experimental results show that we can successfully locate data of interest in tables and map the data from source HTML tables with unknown structure to a given target database schema. We can thus "directly" query source data with unknown structure through a known target schema.