Generative models for name disambiguation

Authors:
Yang Song;Jian Huang;Isaac G. Councill;Jia Li;C. Lee Giles
Affiliations:
Pennsylvania State University;Pennsylvania State University;Pennsylvania State University;Pennsylvania State University;Pennsylvania State University
Venue:
Proceedings of the 16th international conference on World Wide Web
Year:
2007

Citing 2
Cited 3

Probabilistic latent semantic indexing

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Latent dirichlet allocation

The Journal of Machine Learning Research

Author name disambiguation for citations on the deep web

WAIM'10 Proceedings of the 2010 international conference on Web-age information management
Combining machine learning and human judgment in author disambiguation

Proceedings of the 20th ACM international conference on Information and knowledge management
AUTOMATIC ANNOTATION OF AMBIGUOUS PERSONAL NAMES ON THE WEB

Computational Intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Name ambiguity is a special case of identity uncertainty where one person can be referenced by multiple name variations in different situations or evenshare the same name with other people. In this paper, we present an efficient framework by using two novel topic-based models, extended from Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA). Our models explicitly introduce a new variable for persons and learn the distribution of topics with regard to persons and words. Experiments indicate that our approach consistently outperforms other unsupervised methods including spectral and DBSCAN clustering. Scalability is addressed by disambiguating authors in over 750,000 papers from the entire CiteSeer dataset.