Web-based open-domain information extraction

Authors:
Marius Pasca
Affiliations:
Google Inc., Mountain View, CA, USA
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 0
Cited 2

Automatic pipeline construction for real-time annotation

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Information extraction as a filtering task

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.00

Visualization

Abstract

This tutorial provides an overview of extraction methods developed in the area of Web-based open-domain information extraction, whose purpose is the acquisition of open-domain classes, instances and relations from Web text. The extraction methods operate over unstructured or semi-structured text. They take advantage of weak supervision provided in the form of seed examples or small amounts of annotated data, or draw upon knowledge already encoded within resources created strictly by experts or collaboratively by users. The tutorial teaches the audience about existing resources that include instances and relations; details of methods for extracting such data from structured and semi-structured text available on the Web; and strengths and limitations of resources extracted from text as part of recent literature, with applications in knowledge discovery and information retrieval.