Robust hybrid name disambiguation framework for large databases

Authors:
Jia Zhu;Yi Yang;Qing Xie;Liwei Wang;Saeed-Ul Hassan
Affiliations:
School of Computer Science, South China Normal University, Guangzhou, China;School of Computer Science, Carnegie Mellon University, Pittsburgh, USA;Division of CEMSE, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia;Wuhan University, Wuhan, China;COMSATS Institute of Information Technology, Lahore, Pakistan
Venue:
Scientometrics
Year:
2014

Citing 21
Cited 0

On the Optimality of the Simple Bayesian Classifier under Zero-One Loss

Machine Learning - Special issue on learning with probabilistic representations
Neural Networks: A Comprehensive Foundation

Neural Networks: A Comprehensive Foundation
Two supervised learning approaches for name disambiguation in author citations

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Automatic Identification of Home Pages on the Web

HICSS '05 Proceedings of the Proceedings of the 38th Annual Hawaii International Conference on System Sciences (HICSS'05) - Track 4 - Volume 04
Name disambiguation in author citations using a K-way spectral clustering method

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Effective and scalable solutions for mixed and split citation problems in digital libraries

Proceedings of the 2nd international workshop on Information quality in information systems
Semantic analytics on social networks: experiences in addressing the problem of conflict of interest detection

Proceedings of the 15th international conference on World Wide Web
Domain-independent data cleaning via analysis of entity-relationship graph

ACM Transactions on Database Systems (TODS)
Search engine driven author disambiguation

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Efficient topic-based unsupervised name disambiguation

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Author Name Disambiguation for Citations Using Topic and Web Correlation

ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
Classifier Ensemble Generation for the Majority Vote Rule

CIARP '08 Proceedings of the 13th Iberoamerican congress on Pattern Recognition: Progress in Pattern Recognition, Image Analysis and Applications
On co-authorship for author disambiguation

Information Processing and Management: an International Journal
A Term-Based Driven Clustering Approach for Name Disambiguation

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Using web information for author name disambiguation

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Graph clustering based on structural/attribute similarities

Proceedings of the VLDB Endowment
Efficient web pages identification for entity resolution

Proceedings of the 19th international conference on World wide web
Sampling dirty data for matching attributes

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Enhance web pages genre identification using neighboring pages

WISE'11 Proceedings of the 12th international conference on Web information system engineering
Efficient name disambiguation for large-scale databases

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Author name disambiguation in scientific collaboration and mobility cases

Scientometrics

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many databases, science bibliography database for example, name attribute is the most commonly chosen identifier to identify entities. However, names are often ambiguous and not always unique which cause problems in many fields. Name disambiguation is a non-trivial task in data management that aims to properly distinguish different entities which share the same name, particularly for large databases like digital libraries, as only limited information can be used to identify authors' name. In digital libraries, ambiguous author names occur due to the existence of multiple authors with the same name or different name variations for the same person. Also known as name disambiguation, most of the previous works to solve this issue often employ hierarchical clustering approaches based on information inside the citation records, e.g. co-authors and publication titles. In this paper, we focus on proposing a robust hybrid name disambiguation framework that is not only applicable for digital libraries but also can be easily extended to other application based on different data sources. We propose a web pages genre identification component to identify the genre of a web page, e.g. whether the page is a personal homepage. In addition, we propose a re-clustering model based on multidimensional scaling that can further improve the performance of name disambiguation. We evaluated our approach on known corpora, and the favorable experiment results indicated that our proposed framework is feasible.