Unsupervised wrapper induction using linked data

Authors:
Anna Lisa Gentile;Ziqi Zhang;Isabelle Augenstein;Fabio Ciravegna
Affiliations:
University of Sheffield, Sheffield, United Kingdom;University of Sheffield, Sheffield, United Kingdom;University of Sheffield, Sheffield, United Kingdom;University of Sheffield, Sheffield, United Kingdom
Venue:
Proceedings of the seventh international conference on Knowledge capture
Year:
2013

Citing 16
Cited 0

Learning Information Extraction Rules for Semi-Structured and Free Text

Machine Learning - Special issue on natural language learning
Hierarchical Wrapper Induction for Semistructured Information Sources

Autonomous Agents and Multi-Agent Systems
Extracting structured data from Web pages

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Automatic information extraction from large websites

Journal of the ACM (JACM)
Bootstrapping Information Extraction from Semi-structured Web Pages

ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Robust web extraction: an approach based on a probabilistic tree-edit model

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Active learning with strong and weak views: a case study on wrapper induction

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Overcoming Schema Heterogeneity between Linked Semantic Repositories to Improve Coreference Resolution

ASWC '09 Proceedings of the 4th Asian Conference on The Semantic Web
Learning to Adapt Web Information Extraction Knowledge and Discovering New Attributes via a Bayesian Approach

IEEE Transactions on Knowledge and Data Engineering
Large scale relation detection

FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Automatic wrappers for large scale web extraction

Proceedings of the VLDB Endowment
Linked Data

Linked Data
Web-scale information extraction with vertex

ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
From one tree to a forest: a unified solution for structured web data extraction

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
A dynamic learning framework to thoroughly extract structured data from web pages without human efforts

Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics
Large-Scale learning of relation-extraction rules with distant supervision from the web

ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

This work explores the usage of Linked Data for Web scale Information Extraction and shows encouraging results on the task of Wrapper Induction. We propose a simple knowledge based method which is (i) highly flexible with respect to different domains and (ii) does not require any training material, but exploits Linked Data as background knowledge source to build essential learning resources. The major contribution of this work is a study of how Linked Data - an imprecise, redundant and large-scale knowledge resource - can be used to support Web scale Information Extraction in an effective and efficient way and identify the challenges involved. We show that, for domains that are covered, Linked Data serve as a powerful knowledge resource for Information Extraction. Experiments on a publicly available dataset demonstrate that, under certain conditions, this simple unsupervised approach can achieve competitive results against some complex state of the art that always depends on training data.