Using clustering strategies for creating authority files
Journal of the American Society for Information Science
Knowledge Acquisition Via Incremental Conceptual Clustering
Machine Learning
Two supervised learning approaches for name disambiguation in author citations
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Journal of the American Society for Information Science and Technology
Name disambiguation in author citations using a K-way spectral clustering method
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Comparative study of name disambiguation problem using a scalable blocking-based framework
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Unsupervised personal name disambiguation
CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Domain-independent data cleaning via analysis of entity-relationship graph
ACM Transactions on Database Systems (TODS)
Search engine driven author disambiguation
Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Record linkage: similarity measures and algorithms
Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Adaptive Blocking: Learning to Scale Up Record Linkage
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Collective entity resolution in relational data
ACM Transactions on Knowledge Discovery from Data (TKDD)
Efficient topic-based unsupervised name disambiguation
Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Approximate personal name-matching through finite-state graphs
Journal of the American Society for Information Science and Technology
Survey on test collections and techniques for personal name matching
International Journal of Metadata, Semantics and Ontologies
Computer Methods and Programs in Biomedicine
Improving author coreference by resource-bounded information gathering from the web
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
Annual Review of Information Science and Technology
Efficient name disambiguation for large-scale databases
PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Effective self-training author name disambiguation in scholarly digital libraries
Proceedings of the 10th annual joint conference on Digital libraries
Recent research for MEDLINE/PubMed: short review
DTMBIO '10 Proceedings of the ACM fourth international workshop on Data and text mining in biomedical informatics
Proceedings of the 1st ACM International Health Informatics Symposium
Who shares? Who doesn't?: bibliometric factors associated with open archiving of biomedical datasets
Proceedings of the 73rd ASIS&T Annual Meeting on Navigating Streams in an Information Ecosystem - Volume 47
A method to track dataset reuse in biomedicine: filtered GEO accession numbers in PubMed central
Proceedings of the 73rd ASIS&T Annual Meeting on Navigating Streams in an Information Ecosystem - Volume 47
Semi-automatic semantic annotation of PubMed queries: A study on quality, efficiency, satisfaction
Journal of Biomedical Informatics
Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Metadata enrichment via topic models for author name disambiguation
NLP4DL'09/AT4DL'09 Proceedings of the 2009 international conference on Advanced language technologies for digital libraries
Frequency-aware similarity measures: why Arnold Schwarzenegger is always a duplicate
Proceedings of the 20th ACM international conference on Information and knowledge management
Authormagic: an approach to author disambiguation in large-scale digital libraries
Proceedings of the 20th ACM international conference on Information and knowledge management
Automatic annotation of bibliographical references in digital humanities books, articles and blogs
Proceedings of the 4th ACM workshop on Online books, complementary social media and crowdsourcing
Cost-effective on-demand associative author name disambiguation
Information Processing and Management: an International Journal
A tool for generating synthetic authorship records for evaluating author name disambiguation methods
Information Sciences: an International Journal
Active associative sampling for author name disambiguation
Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Citation-based bootstrapping for large-scale author disambiguation
Journal of the American Society for Information Science and Technology
A brief survey of automatic methods for author name disambiguation
ACM SIGMOD Record
Author name disambiguation: What difference does it make in author-based citation analysis?
Journal of the American Society for Information Science and Technology
Author name disambiguation using a new categorical distribution similarity
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
A relevance feedback approach for the author name disambiguation problem
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
A search engine approach to estimating temporal changes in gender orientation of first names
Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion
Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Effective string processing and matching for author disambiguation
Proceedings of the 2013 KDD Cup 2013 Workshop
Hi-index | 0.00 |
Background: We recently described “Author-ity,” a model for estimating the probability that two articles in MEDLINE, sharing the same author name, were written by the same individual. Features include shared title words, journal name, coauthors, medical subject headings, language, affiliations, and author name features (middle initial, suffix, and prevalence in MEDLINE). Here we test the hypothesis that the Author-ity model will suffice to disambiguate author names for the vast majority of articles in MEDLINE. Methods: Enhancements include: (a) incorporating first names and their variants, email addresses, and correlations between specific last names and affiliation words; (b) new methods of generating large unbiased training sets; (c) new methods for estimating the prior probability; (d) a weighted least squares algorithm for correcting transitivity violations; and (e) a maximum likelihood based agglomerative algorithm for computing clusters of articles that represent inferred author-individuals. Results: Pairwise comparisons were computed for all author names on all 15.3 million articles in MEDLINE (2006 baseline), that share last name and first initial, to create Author-ity 2006, a database that has each name on each article assigned to one of 6.7 million inferred author-individual clusters. Recall is estimated at ∼98.8%. Lumping (putting two different individuals into the same cluster) affects ∼0.5% of clusters, whereas splitting (assigning articles written by the same individual to 1 cluster) affects ∼2% of articles. Impact: The Author-ity model can be applied generally to other bibliographic databases. Author name disambiguation allows information retrieval and data integration to become person-centered, not just document-centered, setting the stage for new data mining and social network tools that will facilitate the analysis of scholarly publishing and collaboration behavior. Availability: The Author-ity 2006 database is available for nonprofit academic research, and can be freely queried via http://arrowsmith.psych.uic.edu.