Topic-level social network search
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Active associative sampling for author name disambiguation
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
A brief survey of automatic methods for author name disambiguation
ACM SIGMOD Record
Author name disambiguation using a new categorical distribution similarity
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
An automatic system for identifying authorities in digital libraries
Expert Systems with Applications: An International Journal
A relevance feedback approach for the author name disambiguation problem
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Towards a fair comparison between name disambiguation approaches
Proceedings of the 10th Conference on Open Research Areas in Information Retrieval
Bootstrapping active name disambiguation with crowdsourcing
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Contextual rule-based feature engineering for author-paper identification
Proceedings of the 2013 KDD Cup 2013 Workshop
A semi-supervised approach for author disambiguation in KDD CUP 2013
Proceedings of the 2013 KDD Cup 2013 Workshop
Hi-index | 0.00 |
Despite years of research, the name ambiguity problem remains largely unresolved. Outstanding issues include how to capture all information for name disambiguation in a unified approach, and how to determine the number of people K in the disambiguation process. In this paper, we formalize the problem in a unified probabilistic framework, which incorporates both attributes and relationships. Specifically, we define a disambiguation objective function for the problem and propose a two-step parameter estimation algorithm. We also investigate a dynamic approach for estimating the number of people K. Experiments show that our proposed framework significantly outperforms four baseline methods of using clustering algorithms and two other previous methods. Experiments also indicate that the number K automatically found by our method is close to the actual number.