The Journal of Machine Learning Research
Radial Basis Functions
Convex Optimization
Two supervised learning approaches for name disambiguation in author citations
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Name disambiguation in author citations using a K-way spectral clustering method
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Domain-independent data cleaning via analysis of entity-relationship graph
ACM Transactions on Database Systems (TODS)
Duplicate Record Detection: A Survey
IEEE Transactions on Knowledge and Data Engineering
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
A constraint-based probabilistic framework for name disambiguation
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Author name disambiguation in MEDLINE
ACM Transactions on Knowledge Discovery from Data (TKDD)
Frameworks for entity matching: A comparison
Data & Knowledge Engineering
Proceedings of the VLDB Endowment
Co-authorship networks in the digital library research community
Information Processing and Management: an International Journal - Special issue: Infometrics
On Graph-Based Name Disambiguation
Journal of Data and Information Quality (JDIQ)
A brief survey of automatic methods for author name disambiguation
ACM SIGMOD Record
Hi-index | 0.00 |
Entity disambiguation is an important step in many information retrieval applications. This paper proposes new research for entity disambiguation with the focus of name disambiguation in digital libraries. In particular, pairwise similarity is first learned for publications that share the same author name string (ANS) and then a novel Hierarchical Agglomerative Clustering approach with Adaptive Stopping Criterion (HACASC) is proposed to adaptively cluster a set of publications that share a same ANS to individual clusters of publications with different author identities. The HACASC approach utilizes a mixture of kernel ridge regressions to intelligently determine the threshold in clustering. This obtains more appropriate clustering granularity than non-adaptive stopping criterion. We conduct a large scale empirical study with a dataset of more than 2 million publication record pairs to demonstrate the advantage of the proposed HACASC approach.