Information extraction as a filtering task

Authors:
Henning Wachsmuth;Benno Stein;Gregor Engels
Affiliations:
University of Paderborn, s-lab, Paderborn, Germany;Bauhaus-Universität Weimar, Weimar, Germany;University of Paderborn, s-lab, Paderborn, Germany
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 35
Cited 0

A blackboard architecture for control

Artificial Intelligence
Information extraction

Communications of the ACM
Sentence Filtering for Information Extraction in Genomics, a Classification Problem

PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Evaluating message understanding systems: an analysis of the third message understanding conference (MUC-3)

Computational Linguistics
Examining the role of statistical and linguistic knowledge sources in a general-knowledge question-answering system

ANLC '00 Proceedings of the sixth conference on Applied natural language processing
How to build a WebFountain: An architecture for very large-scale text analytics

IBM Systems Journal
UIMA: an architectural approach to unstructured information processing in the corporate research environment

Natural Language Engineering
Question answering passage retrieval using dependency relations

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Introduction to the CoNLL-2003 shared task: language-independent named entity recognition

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Managing information extraction: state of the art and research directions

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Speech and Language Processing (2nd Edition)

Speech and Language Processing (2nd Edition)
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Towards terascale knowledge acquisition

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
KnowItNow: fast, scalable information extraction from the web

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Declarative information extraction using datalog with embedded extraction predicates

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Web-scale named entity recognition

Proceedings of the 17th ACM conference on Information and knowledge management
Extremely fast text feature extraction for classification and indexing

Proceedings of the 17th ACM conference on Information and knowledge management
Information Extraction

Foundations and Trends in Databases
Information extraction challenges in managing unstructured data

ACM SIGMOD Record
Solving the problem of cascading errors: approximate Bayesian inference for linguistic annotation pipelines

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Joint inference in information extraction

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
SystemT: an algebraic approach to declarative information extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Efficient statement identification for automatic market forecasting

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Annotating and searching web tables using entities, types and relationships

Proceedings of the VLDB Endowment
Text Processing with GATE

Text Processing with GATE
Filtering and clustering relations for unsupervised information extraction in open domain

Proceedings of the 20th ACM international conference on Information and knowledge management
Building a generic debugger for information extraction pipelines

Proceedings of the 20th ACM international conference on Information and knowledge management
Constructing efficient information extraction pipelines

Proceedings of the 20th ACM international conference on Information and knowledge management
Web-based open-domain information extraction

Proceedings of the 20th ACM international conference on Information and knowledge management
Overview of BioNLP Shared Task 2011

BioNLP Shared Task '11 Proceedings of the BioNLP Shared Task 2011 Workshop
Automatic event extraction with structured preference modeling

ACL '12 Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers - Volume 1
Open language learning for information extraction

EMNLP-CoNLL '12 Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning
Predicting the performance of passage retrieval for question answering

Proceedings of the 21st ACM international conference on Information and knowledge management
Automatic pipeline construction for real-time annotation

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Information extraction is usually approached as an annotation task: Input texts run through several analysis steps of an extraction process in which different semantic concepts are annotated and matched against the slots of templates. We argue that such an approach lacks an efficient control of the input of the analysis steps. In this paper, we hence propose and evaluate a model and a formal approach that consistently put the filtering view in the focus: Before spending annotation effort, filter those portions of the input texts that may contain relevant information for filling a template and discard the others. We model all dependencies between the semantic concepts sought for with a truth maintenance system, which then efficiently infers the portions of text to be annotated in each analysis step. The filtering view enables an information extraction system (1) to annotate only relevant portions of input texts and (2) to easily trade its run-time efficiency for its recall. We provide our approach as an open-source extension of Apache UIMA and we show the potential of our approach in a number of experiments.