A Unified Probabilistic Framework for Name Disambiguation in Digital Library

Authors:
Jie Tang;Alvis C. M. Fong;Bo Wang;Jing Zhang
Affiliations:
Tsinghua University, Beijing;Auckland University of Technology, Auckland;Nanjing University of Aeronautics and Astronautics, Beijing;Tsinghua University, Beijing
Venue:
IEEE Transactions on Knowledge and Data Engineering
Year:
2012

Citing 0
Cited 10

Topic-level social network search

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Active associative sampling for author name disambiguation

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
A brief survey of automatic methods for author name disambiguation

ACM SIGMOD Record
Author name disambiguation using a new categorical distribution similarity

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
An automatic system for identifying authorities in digital libraries

Expert Systems with Applications: An International Journal
A relevance feedback approach for the author name disambiguation problem

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Towards a fair comparison between name disambiguation approaches

Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Bootstrapping active name disambiguation with crowdsourcing

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Contextual rule-based feature engineering for author-paper identification

Proceedings of the 2013 KDD Cup 2013 Workshop
A semi-supervised approach for author disambiguation in KDD CUP 2013

Proceedings of the 2013 KDD Cup 2013 Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Despite years of research, the name ambiguity problem remains largely unresolved. Outstanding issues include how to capture all information for name disambiguation in a unified approach, and how to determine the number of people K in the disambiguation process. In this paper, we formalize the problem in a unified probabilistic framework, which incorporates both attributes and relationships. Specifically, we define a disambiguation objective function for the problem and propose a two-step parameter estimation algorithm. We also investigate a dynamic approach for estimating the number of people K. Experiments show that our proposed framework significantly outperforms four baseline methods of using clustering algorithms and two other previous methods. Experiments also indicate that the number K automatically found by our method is close to the actual number.