An automated approach for retrieving hierarchical data from HTML tables
Proceedings of the eighth international conference on Information and knowledge management
Conceptual-model-based data extraction from multiple-record Web pages
Data & Knowledge Engineering
Reconciling schemas of disparate data sources: a machine-learning approach
SIGMOD '01 Proceedings of the 2001 ACM SIGMOD international conference on Management of data
Database System Concepts
A Tabular Survey of Automated Table Processing
GREC '99 Selected Papers from the Third International Workshop on Graphics Recognition, Recent Advances
Schema Mapping as Query Discovery
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Proceedings of the 27th International Conference on Very Large Data Bases
Generic Schema Matching with Cupid
Proceedings of the 27th International Conference on Very Large Data Bases
On the Automatic Extraction of Data from the Hidden Web
Revised Papers from the HUMACS, DASWIS, ECOMO, and DAMA on ER 2001 Workshops
AutoFeed: an unsupervised learning system for generating webfeeds
Proceedings of the 3rd international conference on Knowledge capture
Transforming arbitrary tables into logical form with TARTAR
Data & Knowledge Engineering
Towards web information extraction using extraction ontologies and (indirectly) domain ontologies
Proceedings of the 4th international conference on Knowledge capture
What Have Innsbruck and Leipzig in Common? Extracting Semantics from Wiki Content
ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
Automatic hidden-web table interpretation, conceptualization, and semantic annotation
Data & Knowledge Engineering
SCOVO: Using Statistics on the Web of Data
ESWC 2009 Heraklion Proceedings of the 6th European Semantic Web Conference on The Semantic Web: Research and Applications
Overview of autofeed: an unsupervised learning system for generating webfeeds
AAAI'06 proceedings of the 21st national conference on Artificial intelligence - Volume 2
Web Semantics: Science, Services and Agents on the World Wide Web
An ontology-driven annotation of data tables
WISE'07 Proceedings of the 2007 international conference on Web information systems engineering
Combining multiple sources of evidence in web information extraction
ISMIS'08 Proceedings of the 17th international conference on Foundations of intelligent systems
Ontology-based HTML to XML conversion
WAIM'05 Proceedings of the 6th international conference on Advances in Web-Age Information Management
Using ontologies for extracting product features from web pages
ISWC'06 Proceedings of the 5th international conference on The Semantic Web
An XML approach to semantically extract data from HTML tables
DEXA'05 Proceedings of the 16th international conference on Database and Expert Systems Applications
Notes on contemporary table recognition
DAS'06 Proceedings of the 7th international conference on Document Analysis Systems
Hi-index | 0.00 |
Data on the Web in HTML tables is mostly structured, but we usually do not know the structure in advance. Thus, we cannot directly query for data of interest. We propose a solution to this problem based on document-independent extraction ontologies. The solution entails elements of table understanding, data integration, and wrapper creation. Table understanding allows us to recognize attributes and values, pair attributes with values, and form records. Data-integration techniques allow us to match source records with a target schema. Ontologically specified wrappers allow us to extract data from source records into a target schema. Experimental results show that we can successfully map data of interest from source HTML tables with unknown structure to a given target database schema. We can thus "directly" query source data with unknown structure through a known target schema.