Learning to Refine an Automatically Extracted Knowledge Base Using Markov Logic

  • Authors:
  • Shangpu Jiang;Daniel Lowd;Dejing Dou

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDM '12 Proceedings of the 2012 IEEE 12th International Conference on Data Mining
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

A number of text mining and information extraction projects such as Text Runner and NELL seek to automatically build knowledge bases from the rapidly growing amount of information on the web. In order to scale to the size of the web, these projects often employ ad hoc heuristics to reason about uncertain and contradictory information rather than reasoning jointly about all candidate facts. In this paper, we present a Markov logic-based system for cleaning an extracted knowledge base. This allows a scalable system such as NELL to take advantage of joint probabilistic inference, or, conversely, allows Markov logic to be applied to a web scale problem. Our system uses only the ontological constraints and confidence values of the original system, along with human-labeled data if available. The labeled data can be used to calibrate the confidence scores from the original system or learn the effectiveness of individual extraction patterns. To achieve scalability, we introduce a neighborhood grounding method that only instantiates the part of the network most relevant to the given query. This allows us to partition the knowledge cleaning task into tractable pieces that can be solved individually. In experiments on NELL's knowledge base, we evaluate several variants of our approach and find that they improve both F1 and area under the precision-recall curve.