The double metaphone search algorithm
C/C++ Users Journal
Similarity estimation techniques from rounding algorithms
STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
SimRank: a measure of structural-context similarity
Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Disambiguating authors in academic publications using random forests
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
The Microsoft academic search dataset and KDD Cup 2013
Proceedings of the 2013 KDD Cup 2013 Workshop
Hi-index | 0.00 |
Author name ambiguity is a frequently encountered problem in digital publication libraries such as Microsoft Academic Search. The cause of this problem mostly is that different authors may publish under the same name, while the same author could publish under various names due to abbreviations, nicknames, etc. Author disambiguation is exactly the goal of the Track II of KDD Cup Data Mining Contest 2013. In this paper we introduce our ranking-based name matching algorithm and system called RankMatch. One important feature of our solution is using heterogeneous meta-paths to evaluate the similarity between two potential duplicate authors whose names are compatible. We participated under team name "SmallData" and our final solution achieved a Mean F1 score of 99.157%, ranking in the second place in the contest.