A Further Comparison of Splitting Rules for Decision-Tree Induction
Machine Learning
The merge/purge problem for large databases
SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Efficient clustering of high-dimensional data sets with application to reference matching
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Hardening soft information sources
Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Learning object identification rules for information integration
Information Systems - Data extraction, cleaning and reconciliation
Machine Learning
Efficient Record Linkage in Large Data Sets
DASFAA '03 Proceedings of the Eighth International Conference on Database Systems for Advanced Applications
A machine learning approach to coreference resolution of noun phrases
Computational Linguistics - Special issue on computational anaphora resolution
Two supervised learning approaches for name disambiguation in author citations
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Journal of the American Society for Information Science and Technology
Disambiguating Web appearances of people in a social network
WWW '05 Proceedings of the 14th international conference on World Wide Web
Near-duplicate detection for eRulemaking
dg.o '05 Proceedings of the 2005 national conference on Digital government research
Name disambiguation in author citations using a K-way spectral clustering method
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Comparative study of name disambiguation problem using a scalable blocking-based framework
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Adaptive Name Matching in Information Integration
IEEE Intelligent Systems
Adaptive Blocking: Learning to Scale Up Record Linkage
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Efficient topic-based unsupervised name disambiguation
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
D-Swoosh: A Family of Algorithms for Generic, Distributed Entity Resolution
ICDCS '07 Proceedings of the 27th International Conference on Distributed Computing Systems
Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Efficient name disambiguation for large-scale databases
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Effective self-training author name disambiguation in scholarly digital libraries
Proceedings of the 10th annual joint conference on Digital libraries
SEERLAB: A system for extracting key phrases from scholarly documents
SemEval '10 Proceedings of the 5th International Workshop on Semantic Evaluation
COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics
Eliminating the redundancy in blocking-based entity resolution methods
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
CollabSeer: a search engine for collaboration discovery
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Incorporating user feedback into name disambiguation of scientific cooperation network
WAIM'11 Proceedings of the 12th international conference on Web-age information management
Combining machine learning and human judgment in author disambiguation
Proceedings of the 20th ACM international conference on Information and knowledge management
Disambiguating authors in citations on the web and authorship correlations
Expert Systems with Applications: An International Journal
Cost-effective on-demand associative author name disambiguation
Information Processing and Management: an International Journal
A tool for generating synthetic authorship records for evaluating author name disambiguation methods
Information Sciences: an International Journal
Active associative sampling for author name disambiguation
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Citation-based bootstrapping for large-scale author disambiguation
Journal of the American Society for Information Science and Technology
Predicting recent links in FOAF networks
SBP'12 Proceedings of the 5th international conference on Social Computing, Behavioral-Cultural Modeling and Prediction
A brief survey of automatic methods for author name disambiguation
ACM SIGMOD Record
Taxonomy-based query-dependent schemes for profile similarity measurement
Proceedings of the 1st Joint International Workshop on Entity-Oriented and Semantic Search
Vietnamese author name disambiguation for integrating publications from heterogeneous sources
ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
An automatic system for identifying authorities in digital libraries
Expert Systems with Applications: An International Journal
A relevance feedback approach for the author name disambiguation problem
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
CSSeer: an expert recommendation system based on CiteseerX
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Effective string processing and matching for author disambiguation
Proceedings of the 2013 KDD Cup 2013 Workshop
Ranking-based name matching for author disambiguation in bibliographic data
Proceedings of the 2013 KDD Cup 2013 Workshop
A semi-supervised approach for author disambiguation in KDD CUP 2013
Proceedings of the 2013 KDD Cup 2013 Workshop
Name disambiguation in scientific cooperation network by exploiting user feedback
Artificial Intelligence Review
Hi-index | 0.00 |
Users of digital libraries usually want to know the exact author or authors of an article. But different authors may share the same names, either as full names or as initials and last names (complete name change examples are not considered here). In such a case, the user would like the digital library to differentiate among these authors. Name disambiguation can help in many cases; one being a user in a search of all articles written by a particular author. Disambiguation also enables better bibliometric analysis by allowing a more accurate counting and grouping of publications and citations. In this paper, we describe an algorithm for pair-wise disambiguation of author names based on a machine learning classification algorithm, random forests. We define a set of similarity profile features to assist in author disambiguation. Our experiments on the Medline database show that the random forest model outperforms other previously proposed techniques such as those using support-vector machines (SVM). In addition, we demonstrate that the variable importance produced by the random forest model can be used in feature selection with little degradation in the disambiguation accuracy. In particular, the inverse document frequency of author last name and the middle name's similarity alone achieves an accuracy of almost 90%.