Constructing efficient information extraction pipelines

Authors:
Henning Wachsmuth;Benno Stein;Gregor Engels
Affiliations:
Universität Paderborn, s-lab, Paderborn, Germany;Bauhaus-Universität Weimar, Weimar, Germany;Universität Paderborn, s-lab, Paderborn, Germany
Venue:
Proceedings of the 20th ACM international conference on Information and knowledge management
Year:
2011

Citing 10
Cited 3

Fast decoding and optimal decoding for machine translation

ACL '01 Proceedings of the 39th Annual Meeting on Association for Computational Linguistics
KnowItNow: fast, scalable information extraction from the web

HLT '05 Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing
Extremely fast text feature extraction for classification and indexing

Proceedings of the 17th ACM conference on Information and knowledge management
Design challenges and misconceptions in named entity recognition

CoNLL '09 Proceedings of the Thirteenth Conference on Computational Natural Language Learning
k-best A* parsing

ACL '09 Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Volume 2 - Volume 2
Components for information extraction: ontology-based information extractors and generic platforms

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
Very high accuracy and fast dependency parsing is not a contradiction

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Efficient statement identification for automatic market forecasting

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Coarse-to-fine natural language processing

Coarse-to-fine natural language processing
A high-performance syntactic and semantic dependency parser

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Demonstrations

In praise of laziness: a lazy strategy for web information extraction

ECIR'12 Proceedings of the 34th European conference on Advances in Information Retrieval
Automatic pipeline construction for real-time annotation

CICLing'13 Proceedings of the 14th international conference on Computational Linguistics and Intelligent Text Processing - Volume Part I
Information extraction as a filtering task

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management

Quantified Score

Hi-index	0.01

Visualization

Abstract

Information Extraction (IE) pipelines analyze text through several stages. The pipeline's algorithms determine both its effectiveness and its run-time efficiency. In real-world tasks, however, IE pipelines often fail acceptable run-times because they analyze too much task-irrelevant text. This raises two interesting questions: 1) How much "efficiency potential" depends on the scheduling of a pipeline's algorithms? 2) Is it possible to devise a reliable method to construct efficient IE pipelines? Both questions are addressed in this paper. In particular, we show how to optimize the run-time efficiency of IE pipelines under a given set of algorithms. We evaluate pipelines for three algorithm sets on an industrially relevant task: the extraction of market forecasts from news articles. Using a system-independent measure, we demonstrate that efficiency gains of up to one order of magnitude are possible without compromising a pipeline's original effectiveness.