Collective information extraction with context-specific consistencies

Authors:
Peter Kluegl;Martin Toepfer;Florian Lemmerich;Andreas Hotho;Frank Puppe
Affiliations:
Department of Computer Science VI, University of Würzburg, Würzburg, Germany, Comprehensive Heart Failure Center, University of Würzburg, Würzburg, Germany;Department of Computer Science VI, University of Würzburg, Würzburg, Germany;Department of Computer Science VI, University of Würzburg, Würzburg, Germany;Department of Computer Science VI, University of Würzburg, Würzburg, Germany;Department of Computer Science VI, University of Würzburg, Würzburg, Germany
Venue:
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Year:
2012

Citing 10
Cited 0

Explora: a multipattern and multistrategy discovery assistant

Advances in knowledge discovery and data mining
Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Collective information extraction with relational Markov networks

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
Incorporating non-local information into information extraction systems by Gibbs sampling

ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
An effective two-stage model for exploiting non-local dependencies in named entity recognition

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Incorporating site-level knowledge to extract structured data from web forums

Proceedings of the 18th international conference on World wide web
Bi-directional Joint Inference for Entity Resolution and Segmentation Using Imperatively-Defined Factor Graphs

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part II
Joint inference in information extraction

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 1
Alternating projections for learning with expectation constraints

UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
Exploiting content redundancy for web information extraction

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

Conditional Random Fields (CRFs) have been widely used for information extraction from free texts as well as from semi-structured documents. Interesting entities in semi-structured domains are often consistently structured within a certain context or document. However, their actual compositions vary and are possibly inconsistent among different contexts. We present two collective information extraction approaches based on CRFs for exploiting these context-specific consistencies. The first approach extends linear-chain CRFs by additional factors specified by a classifier, which learns such consistencies during inference. In a second extended approach, we propose a variant of skip-chain CRFs, which enables the model to transfer long-range evidence about the consistency of the entities. The practical relevance of the presented work for real-world information extraction systems is highlighted in an empirical study. Both approaches achieve a considerable error reduction.