Feature-based models for improving the quality of noisy training data for relation extraction

  • Authors:
  • Benjamin Roth;Dietrich Klakow

  • Affiliations:
  • Saarland University, Saarbrucken, Germany;Saarland University, Saarbrucken, Germany

  • Venue:
  • Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Supervised relation extraction from text relies on annotated data. Distant supervision is a scheme to obtain noisy training data by using a knowledge base of relational tuples as the ground truth and finding entity pair matches in a text corpus. We propose and evaluate two feature-based models for increasing the quality of distant supervision extraction patterns. The first model is an extension of a hierarchical topic model that induces background, relation specific and argument-pair specific feature distributions. The second model is a perceptron, trained to match an objective function that enforces two constraints: 1) an at-least-one semantics, i.e. at least one training example per relational tuple is assumed to be correct; 2) high scores for a dedicated NIL label that accounts for the noise in the training data. For both algorithms, neither explicit negative data nor the ratio of negatives has to be provided. Both algorithms give improvements over a maximum likelihood baseline as well as over a previous topic model without features, evaluated on TAC KBP data.