Effective string processing and matching for author disambiguation

Authors:
Wei-Sheng Chin;Yu-Chin Juan;Yong Zhuang;Felix Wu;Hsiao-Yu Tung;Tong Yu;Jui-Pin Wang;Cheng-Xia Chang;Chun-Pai Yang;Wei-Cheng Chang;Kuan-Hao Huang;Tzu-Ming Kuo;Shan-Wei Lin;Young-San Lin;Yu-Chen Lu;Yu-Chuan Su;Cheng-Kuang Wei;Tu-Chun Yin;Chun-Liang Li;Ting-Wei Lin;Cheng-Hao Tsai;Shou-De Lin;Hsuan-Tien Lin;Chih-Jen Lin
Affiliations:
National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University;National Taiwan University
Venue:
Proceedings of the 2013 KDD Cup 2013 Workshop
Year:
2013

Citing 9
Cited 0

Iterative record linkage for cleaning and integration

Proceedings of the 9th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery
Adaptive Name Matching in Information Integration

IEEE Intelligent Systems
Eliminating fuzzy duplicates in data warehouses

VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Author name disambiguation in MEDLINE

ACM Transactions on Knowledge Discovery from Data (TKDD)
Disambiguating authors in academic publications using random forests

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Frameworks for entity matching: A comparison

Data & Knowledge Engineering
Joint Entity Resolution

ICDE '12 Proceedings of the 2012 IEEE 28th International Conference on Data Engineering
Pay-As-You-Go Entity Resolution

IEEE Transactions on Knowledge and Data Engineering
The Microsoft academic search dataset and KDD Cup 2013

Proceedings of the 2013 KDD Cup 2013 Workshop

Quantified Score

Hi-index	0.00

Visualization

Abstract

Track 2 in KDD Cup 2013 aims at determining duplicated authors in a data set from Microsoft Academic Search. This type of problems appears in many large-scale applications that compile information from different sources. This paper describes our solution developed at National Taiwan University to win the first prize of the competition. We propose an effective name matching framework and realize two implementations. An important strategy in our approach is to consider Chinese and non-Chinese names separately because of their different naming conventions. Post-processing including merging results of two predictions further boosts the performance. Our approach achieves F1-score 0.99202 on the private leader board, while 0.99195 on the public leader board.