Probabilistic matrix factorization leveraging contexts for unsupervised relation extraction

  • Authors:
  • Shingo Takamatsu;Issei Sato;Hiroshi Nakagawa

  • Affiliations:
  • Sony Corporation, Tokyo, Japan;University of Tokyo, Bunkyo-ku, Tokyo, Japan;University of Tokyo, Bunkyo-ku, Tokyo, Japan

  • Venue:
  • PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The clustering of the semantic relations between entities extracted from a corpus is one of the main issues in unsupervised relation extraction (URE). Previous methods assume a huge corpus because they have utilized frequently appearing entity pairs in the corpus. In this paper, we present a URE that works well for a small corpus by using word sequences extracted as relations. The feature vectors of the word sequences are extremely sparse. To deal with the sparseness problem, we take the two approaches: dimension reduction and leveraging context in the whole corpus including sentences from which no relations are extracted. The context in this case is captured with feature co-occurrences, which indicate appearances of two features in a single sentence. The approaches are implemented by a probabilistic matrix factorization that jointly factorizes the matrix of the feature vectors and the matrix of the feature co-occurrences. Experimental results show that our method outperforms previously proposed methods.