On Graph-Based Name Disambiguation

Authors:
Xiaoming Fan;Jianyong Wang;Xu Pu;Lizhu Zhou;Bing Lv
Affiliations:
Tsinghua University;Tsinghua University;Tsinghua University;Tsinghua University;Tsinghua University
Venue:
Journal of Data and Information Quality (JDIQ)
Year:
2011

Citing 24
Cited 7

The merge/purge problem for large databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Autonomous citation matching

Proceedings of the third annual conference on Autonomous Agents
Efficient clustering of high-dimensional data sets with application to reference matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Interactive deduplication using active learning

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Robust and efficient fuzzy match for online data cleaning

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Two supervised learning approaches for name disambiguation in author citations

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Disambiguating Web appearances of people in a social network

WWW '05 Proceedings of the 14th international conference on World Wide Web
Name disambiguation in author citations using a K-way spectral clustering method

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Reference reconciliation in complex information spaces

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Effective and scalable solutions for mixed and split citation problems in digital libraries

Proceedings of the 2nd international workshop on Information quality in information systems
On mining cross-graph quasi-cliques

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Adaptive Name Matching in Information Integration

IEEE Intelligent Systems
Domain-independent data cleaning via analysis of entity-relationship graph

ACM Transactions on Database Systems (TODS)
Contextual search and name disambiguation in email using graphs

SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
Improving Grouped-Entity Resolution Using Quasi-Cliques

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Collective entity resolution in relational data

ACM Transactions on Knowledge Discovery from Data (TKDD)
Adaptive graphical approach to entity resolution

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
A constraint-based probabilistic framework for name disambiguation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Clustering by soft-constraint affinity propagation

Bioinformatics
GHOST: an effective graph-based framework for name distinction

Proceedings of the 17th ACM conference on Information and knowledge management
GRAPE: A Graph-Based Framework for Disambiguating People Appearances in Web Search

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Constructing treatment portfolios using affinity propagation

RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology

Context-based entity description rule for entity resolution

Proceedings of the 20th ACM international conference on Information and knowledge management
A tool for generating synthetic authorship records for evaluating author name disambiguation methods

Information Sciences: an International Journal
Active associative sampling for author name disambiguation

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Citation-based bootstrapping for large-scale author disambiguation

Journal of the American Society for Information Science and Technology
A brief survey of automatic methods for author name disambiguation

ACM SIGMOD Record
A relevance feedback approach for the author name disambiguation problem

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

Name ambiguity stems from the fact that many people or objects share identical names in the real world. Such name ambiguity decreases the performance of document retrieval, Web search, information integration, and may cause confusion in other applications. Due to the same name spellings and lack of information, it is a nontrivial task to distinguish them accurately. In this article, we focus on investigating the problem in digital libraries to distinguish publications written by authors with identical names. We present an effective framework named GHOST (abbreviation for GrapHical framewOrk for name diSambiguaTion), to solve the problem systematically. We devise a novel similarity metric, and utilize only one type of attribute (i.e., coauthorship) in GHOST. Given the similarity matrix, intermediate results are grouped into clusters with a recently introduced powerful clustering algorithm called Affinity Propagation. In addition, as a complementary technique, user feedback can be used to enhance the performance. We evaluated the framework on the real DBLP and PubMed datasets, and the experimental results show that GHOST can achieve both high precision and recall.