Vietnamese author name disambiguation for integrating publications from heterogeneous sources

Authors:
Tin Huynh;Kiem Hoang;Tien Do;Duc Huynh
Affiliations:
University of Information Technology, HCMC, Vietnam;University of Information Technology, HCMC, Vietnam;University of Information Technology, HCMC, Vietnam;University of Information Technology, HCMC, Vietnam
Venue:
ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
Year:
2013

Citing 9
Cited 0

Two supervised learning approaches for name disambiguation in author citations

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Name disambiguation in author citations using a K-way spectral clustering method

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Adaptive Name Matching in Information Integration

IEEE Intelligent Systems
Disambiguating authors in academic publications using random forests

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
The WEKA data mining software: an update

ACM SIGKDD Explorations Newsletter
Combining machine learning and human judgment in author disambiguation

Proceedings of the 20th ACM international conference on Information and knowledge management
Efficient name disambiguation for large-scale databases

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Integrating bibliographical data of computer science publications from online digital libraries

ACIIDS'12 Proceedings of the 4th Asian conference on Intelligent Information and Database Systems - Volume Part III
A brief survey of automatic methods for author name disambiguation

ACM SIGMOD Record

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatic integration of bibliographical data from various sources is a really critical task in the field of digital libraries. One of the most important challenges for this process is the author name disambiguation. In this paper, we applied supervised learning approach and proposed a set of features that can be used to assist training classifiers in disambiguating Vietnamese author names. In order to evaluate efficiency of the proposed features set, we did experiments on five supervised learning methods: Random Forest, Support Vector Machine (SVM), k-Nearest Neighbors (kNN), C4.5 (Decision Tree), Bayes. The experiment dataset collected from three online digital libraries such as Microsoft Academic Search, ACM Digital Library, IEEE Digital Library. Our experiments shown that kNN, Random Forest, C4.5 classifier outperform than the others. The average accuracy archived with kNN approximates 94.55%, random forest is 94.23%, C4.5 is 93.98%, SVM is 91.91% and Bayes is lowest with 81.56%. Summary, we archived the highest accuracy 98.39% for author name disambiguation problem with the proposed feature set in our experiments on the Vietnamese authors dataset.