Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion

Authors:
Lei Cen;Eduard C. Dragut;Luo Si;Mourad Ouzzani
Affiliations:
Purdue University, West Lafayette, IN, USA;Purdue University, West Lafayette, IN, USA;Purdue University, West Lafayette, IN, USA;Qatar Computing Research Institute, Doha, Qatar
Venue:
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Year:
2013

Citing 15
Cited 0

Latent dirichlet allocation

The Journal of Machine Learning Research
Radial Basis Functions

Radial Basis Functions
Convex Optimization

Convex Optimization
Two supervised learning approaches for name disambiguation in author citations

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Name disambiguation in author citations using a K-way spectral clustering method

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Domain-independent data cleaning via analysis of entity-relationship graph

ACM Transactions on Database Systems (TODS)
Duplicate Record Detection: A Survey

IEEE Transactions on Knowledge and Data Engineering
Collective entity resolution in relational data

ACM Transactions on Knowledge Discovery from Data (TKDD)
A constraint-based probabilistic framework for name disambiguation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Author name disambiguation in MEDLINE

ACM Transactions on Knowledge Discovery from Data (TKDD)
Frameworks for entity matching: A comparison

Data & Knowledge Engineering
DBLP: some lessons learned

Proceedings of the VLDB Endowment
Co-authorship networks in the digital library research community

Information Processing and Management: an International Journal - Special issue: Infometrics
On Graph-Based Name Disambiguation

Journal of Data and Information Quality (JDIQ)
A brief survey of automatic methods for author name disambiguation

ACM SIGMOD Record

Quantified Score

Hi-index	0.00

Visualization

Abstract

Entity disambiguation is an important step in many information retrieval applications. This paper proposes new research for entity disambiguation with the focus of name disambiguation in digital libraries. In particular, pairwise similarity is first learned for publications that share the same author name string (ANS) and then a novel Hierarchical Agglomerative Clustering approach with Adaptive Stopping Criterion (HACASC) is proposed to adaptively cluster a set of publications that share a same ANS to individual clusters of publications with different author identities. The HACASC approach utilizes a mixture of kernel ridge regressions to intelligently determine the threshold in clustering. This obtains more appropriate clustering granularity than non-adaptive stopping criterion. We conduct a large scale empirical study with a dataset of more than 2 million publication record pairs to demonstrate the advantage of the proposed HACASC approach.