Large-Scale learning of relation-extraction rules with distant supervision from the web

  • Authors:
  • Sebastian Krause;Hong Li;Hans Uszkoreit;Feiyu Xu

  • Affiliations:
  • Language Technology Lab, DFKI, Berlin, Germany;Language Technology Lab, DFKI, Berlin, Germany;Language Technology Lab, DFKI, Berlin, Germany;Language Technology Lab, DFKI, Berlin, Germany

  • Venue:
  • ISWC'12 Proceedings of the 11th international conference on The Semantic Web - Volume Part I
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

We present a large-scale relation extraction (RE) system which learns grammar-based RE rules from the Web by utilizing large numbers of relation instances as seed. Our goal is to obtain rule sets large enough to cover the actual range of linguistic variation, thus tackling the long-tail problem of real-world applications. A variant of distant supervision learns several relations in parallel, enabling a new method of rule filtering. The system detects both binary and n-ary relations. We target 39 relations from Freebase, for which 3M sentences extracted from 20M web pages serve as the basis for learning an average of 40K distinctive rules per relation. Employing an efficient dependency parser, the average run time for each relation is only 19 hours. We compare these rules with ones learned from local corpora of different sizes and demonstrate that the Web is indeed needed for a good coverage of linguistic variation.