Learning Information Extraction Rules for Semi-Structured and Free Text
Machine Learning - Special issue on natural language learning
Hierarchical Wrapper Induction for Semistructured Information Sources
Autonomous Agents and Multi-Agent Systems
Extracting structured data from Web pages
Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Automatic information extraction from large websites
Journal of the ACM (JACM)
Bootstrapping Information Extraction from Semi-structured Web Pages
ECML PKDD '08 Proceedings of the 2008 European Conference on Machine Learning and Knowledge Discovery in Databases - Part I
Robust web extraction: an approach based on a probabilistic tree-edit model
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Active learning with strong and weak views: a case study on wrapper induction
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
ASWC '09 Proceedings of the 4th Asian Conference on The Semantic Web
IEEE Transactions on Knowledge and Data Engineering
Large scale relation detection
FAM-LbR '10 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading
Automatic wrappers for large scale web extraction
Proceedings of the VLDB Endowment
Linked Data
Web-scale information extraction with vertex
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
From one tree to a forest: a unified solution for structured web data extraction
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Proceedings of the ACM SIGKDD Workshop on Mining Data Semantics
Large-Scale learning of relation-extraction rules with distant supervision from the web
ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
Hi-index | 0.00 |
This work explores the usage of Linked Data for Web scale Information Extraction and shows encouraging results on the task of Wrapper Induction. We propose a simple knowledge based method which is (i) highly flexible with respect to different domains and (ii) does not require any training material, but exploits Linked Data as background knowledge source to build essential learning resources. The major contribution of this work is a study of how Linked Data - an imprecise, redundant and large-scale knowledge resource - can be used to support Web scale Information Extraction in an effective and efficient way and identify the challenges involved. We show that, for domains that are covered, Linked Data serve as a powerful knowledge resource for Information Extraction. Experiments on a publicly available dataset demonstrate that, under certain conditions, this simple unsupervised approach can achieve competitive results against some complex state of the art that always depends on training data.