Collective information extraction with context-specific consistencies

  • Authors:
  • Peter Kluegl;Martin Toepfer;Florian Lemmerich;Andreas Hotho;Frank Puppe

  • Affiliations:
  • Department of Computer Science VI, University of Würzburg, Würzburg, Germany, Comprehensive Heart Failure Center, University of Würzburg, Würzburg, Germany;Department of Computer Science VI, University of Würzburg, Würzburg, Germany;Department of Computer Science VI, University of Würzburg, Würzburg, Germany;Department of Computer Science VI, University of Würzburg, Würzburg, Germany;Department of Computer Science VI, University of Würzburg, Würzburg, Germany

  • Venue:
  • ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Conditional Random Fields (CRFs) have been widely used for information extraction from free texts as well as from semi-structured documents. Interesting entities in semi-structured domains are often consistently structured within a certain context or document. However, their actual compositions vary and are possibly inconsistent among different contexts. We present two collective information extraction approaches based on CRFs for exploiting these context-specific consistencies. The first approach extends linear-chain CRFs by additional factors specified by a classifier, which learns such consistencies during inference. In a second extended approach, we propose a variant of skip-chain CRFs, which enables the model to transfer long-range evidence about the consistency of the entities. The practical relevance of the presented work for real-world information extraction systems is highlighted in an empirical study. Both approaches achieve a considerable error reduction.