Latent Topic Extraction from Relational Table for Record Matching

Authors:
Atsuhiro Takasu;Daiji Fukagawa;Tatsuya Akutsu
Affiliations:
National Institute of Informatics, Tokyo, Japan;National Institute of Informatics, Tokyo, Japan;Kyoto University, Kyoto, Japan 611-0011
Venue:
DS '09 Proceedings of the 12th International Conference on Discovery Science
Year:
2009

Citing 6
Cited 1

Latent dirichlet allocation

The Journal of Machine Learning Research
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Group and topic discovery from relations and text

Proceedings of the 3rd international workshop on Link discovery
Efficient topic-based unsupervised name disambiguation

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
A Latent Topic Model for Complete Entity Resolution

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Extracting key phrases to disambiguate personal name queries in web search

CLIIR '06 Proceedings of the Workshop on How Can Computational Linguistics Improve Information Retrieval?

Cross-lingual keyword recommendation using latent topics

Proceedings of the 1st International Workshop on Information Heterogeneity and Fusion in Recommender Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a latent feature extraction method for record linkage. We first introduce a probabilistic model that generates records with their latent topics. The proposed generative model is designed to utilize the co-occurrence among the attributes of the record. Then, we derive a topic estimation algorithm using the Gibbs sampling technique. The estimated topics are used to identify records. The proposed algorithm works in an unsupervised way; i.e., we do not need to prepare labor-intensive training data. We evaluated the proposed model using bibliographic records and proved that the proposed method tended to perform better for records with more attributes by utilizing their co-occurrence.