Latent Topic Extraction from Relational Table for Record Matching

  • Authors:
  • Atsuhiro Takasu;Daiji Fukagawa;Tatsuya Akutsu

  • Affiliations:
  • National Institute of Informatics, Tokyo, Japan;National Institute of Informatics, Tokyo, Japan;Kyoto University, Kyoto, Japan 611-0011

  • Venue:
  • DS '09 Proceedings of the 12th International Conference on Discovery Science
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a latent feature extraction method for record linkage. We first introduce a probabilistic model that generates records with their latent topics. The proposed generative model is designed to utilize the co-occurrence among the attributes of the record. Then, we derive a topic estimation algorithm using the Gibbs sampling technique. The estimated topics are used to identify records. The proposed algorithm works in an unsupervised way; i.e., we do not need to prepare labor-intensive training data. We evaluated the proposed model using bibliographic records and proved that the proposed method tended to perform better for records with more attributes by utilizing their co-occurrence.