LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
The Journal of Machine Learning Research
Two supervised learning approaches for name disambiguation in author citations
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Name disambiguation in author citations using a K-way spectral clustering method
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Search engine driven author disambiguation
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
A Survey of Web Information Extraction Systems
IEEE Transactions on Knowledge and Data Engineering
Author Name Disambiguation for Citations Using Topic and Web Correlation
ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
A Term-Based Driven Clustering Approach for Name Disambiguation
APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Disambiguating authors in academic publications using random forests
Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Combining machine learning and human judgment in author disambiguation
Proceedings of the 20th ACM international conference on Information and knowledge management
A Unified Probabilistic Framework for Name Disambiguation in Digital Library
IEEE Transactions on Knowledge and Data Engineering
A brief survey of automatic methods for author name disambiguation
ACM SIGMOD Record
The Microsoft academic search dataset and KDD Cup 2013
Proceedings of the 2013 KDD Cup 2013 Workshop
Hi-index | 0.00 |
Name disambiguation, which aims to identify multiple names which correspond to one person and same names which refer to different persons, is one of the most important basic problems in many areas such as natural language processing, information retrieval and digital libraries. Microsoft academic search data in KDD Cup 2013 Track 2 task brings one such challenge to the researchers in the knowledge discovery and data mining community. Besides the real-world and large-scale characteristic, the Track 2 task raises several challenges: (1) Consideration of both synonym and polysemy problems; (2) Existence of huge amount of noisy data with missing attributes; (3) Absence of labeled data that makes this challenge a cold start problem. In this paper, we describe our solution to Track 2 of KDD Cup 2013. The challenge of this track is author disambiguation, which aims at identifying whether authors are the same person by using academic publication data. We propose a multi-phase semi-supervised approach to deal with the challenge. First, we preprocess the dataset and generate features for models, then construct a coauthor-based network and employ community detection to accomplish first-phase disambiguation task, which handles the cold-start problem. Second, using results in first phase, we use support vector machine and various other models to utilize noisy data with missing attributes in the dataset. Further, we propose a self-taught procedure to solve ambiguity in coauthor information, boosting performance of results from other models. Finally, by blending results from different models, we finally achieves 6th place with 0.98717 mean F-score on public leaderboard and 7th place with 0.98651 mean F-score on private leaderboard.