Using gazetteers in discriminative information extraction

  • Authors:
  • Andrew Smith;Miles Osborne

  • Affiliations:
  • University of Edinburgh, United Kingdom;University of Edinburgh, United Kingdom

  • Venue:
  • CoNLL-X '06 Proceedings of the Tenth Conference on Computational Natural Language Learning
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Much work on information extraction has successfully used gazetteers to recognise uncommon entities that cannot be reliably identified from local context alone. Approaches to such tasks often involve the use of maximum entropy-style models, where gazetteers usually appear as highly informative features in the model. Although such features can improve model accuracy, they can also introduce hidden negative effects. In this paper we describe and analyse these effects and suggest ways in which they may be overcome. In particular, we show that by quarantining gazetteer features and training them in a separate model, then decoding using a logarithmic opinion pool (Smith et al., 2005), we may achieve much higher accuracy. Finally, we suggest ways in which other features with gazetteer feature-like behaviour may be identified.