AMBER: turning annotations into knowledge

Authors:
Cheng Wang
Affiliations:
University of Oxford, Oxford, United Kingdom
Venue:
Proceedings of the 21st international conference companion on World Wide Web
Year:
2012

Citing 17
Cited 0

Generating finite-state transducers for semi-structured data extraction from the Web

Information Systems - Special issue on semistructured data
Conceptual-model-based data extraction from multiple-record Web pages

Data & Knowledge Engineering
DEByE - Date extraction by example

Data & Knowledge Engineering
Visual Web Information Extraction with Lixto

Proceedings of the 27th International Conference on Very Large Data Bases
Boosted Wrapper Induction

Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence
Data extraction and label assignment for web databases

WWW '03 Proceedings of the 12th international conference on World Wide Web
Extracting structured data from Web pages

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Fully automatic wrapper generation for search engines

WWW '05 Proceedings of the 14th international conference on World Wide Web
Information extraction from structured documents using k-testable tree automaton inference

Data & Knowledge Engineering
Structured Data Extraction from the Web Based on Partial Tree Alignment

IEEE Transactions on Knowledge and Data Engineering
Automatic wrapper induction from hidden-web sources with domain knowledge

Proceedings of the 10th ACM workshop on Web information and data management
ODE: Ontology-assisted data extraction

ACM Transactions on Database Systems (TODS)
Can we learn a template-independent wrapper for news article extraction from a single training site?

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
FiVaTech: Page-Level Web Data Extraction from Template Pages

IEEE Transactions on Knowledge and Data Engineering
Automatic wrappers for large scale web extraction

Proceedings of the VLDB Endowment
Text Processing with GATE

Text Processing with GATE
Semistructured data: the TSIMMIS experience

ADBIS'97 Proceedings of the First East-European conference on Advances in Databases and Information systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Web extraction is the task of turning unstructured HTML into knowledge. Computers are able to generate annotations of unstructured HTML, but it is more important to turn those annotations into structured knowledge. Unfortunately, the current systems extracting knowledge from result pages lack accuracy. In this proposal, we present AMBER, a system fully automated turning annotations to structured knowledge from any result page of a given domain. AMBER observes basic domain attributes on a page and leverages repeated occurrences of similar attributes to group related attributes into records. This contrasts to previous approaches that analyze the repeated structure only of the HTML, as no domain knowledge is available. Our multi-domain experimental evaluation on hundreds of sites demonstrates that AMBER achieves accuracy (98%) comparable to skilled human annotator.