Probabilistic matrix factorization leveraging contexts for unsupervised relation extraction

Authors:
Shingo Takamatsu;Issei Sato;Hiroshi Nakagawa
Affiliations:
Sony Corporation, Tokyo, Japan;University of Tokyo, Bunkyo-ku, Tokyo, Japan;University of Tokyo, Bunkyo-ku, Tokyo, Japan
Venue:
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Year:
2011

Citing 10
Cited 0

Dimensions of meaning

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Automatic word sense discrimination

Computational Linguistics - Special issue on word sense disambiguation
Discovering relations among named entities from large corpora

ACL '04 Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics
URES: an unsupervised web relation extraction system

COLING-ACL '06 Proceedings of the COLING/ACL on Main conference poster sessions
Combining content and link for classification using matrix factorization

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Bayesian probabilistic matrix factorization using Markov chain Monte Carlo

Proceedings of the 25th international conference on Machine learning
SoRec: social recommendation using probabilistic matrix factorization

Proceedings of the 17th ACM conference on Information and knowledge management
Unsupervised methods for determining object and relation synonyms on the web

Journal of Artificial Intelligence Research
Open information extraction from the web

IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Measuring semantic similarity by latent relational analysis

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

The clustering of the semantic relations between entities extracted from a corpus is one of the main issues in unsupervised relation extraction (URE). Previous methods assume a huge corpus because they have utilized frequently appearing entity pairs in the corpus. In this paper, we present a URE that works well for a small corpus by using word sequences extracted as relations. The feature vectors of the word sequences are extremely sparse. To deal with the sparseness problem, we take the two approaches: dimension reduction and leveraging context in the whole corpus including sentences from which no relations are extracted. The context in this case is captured with feature co-occurrences, which indicate appearances of two features in a single sentence. The approaches are implemented by a probabilistic matrix factorization that jointly factorizes the matrix of the feature vectors and the matrix of the feature co-occurrences. Experimental results show that our method outperforms previously proposed methods.