A hierarchical naive Bayes mixture model for name disambiguation in author citations

Authors:
Hui Han;Wei Xu;Hongyuan Zha;C. Lee Giles
Affiliations:
Yahoo Inc., Sunnyvale, CA;NEC Laboratories America, Inc., Cupertino, CA;The Pennsylvania State University, PA;The Pennsylvania State University, PA
Venue:
Proceedings of the 2005 ACM symposium on Applied computing
Year:
2005

Citing 12
Cited 20

Tracking and data association

Tracking and data association
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Duplicate record elimination in large data files

ACM Transactions on Database Systems (TODS)
Efficient clustering of high-dimensional data sets with application to reference matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Hardening soft information sources

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
IntelliClean: a knowledge-based intelligent data cleaner

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
On Bias, Variance, 0/1—Loss, and the Curse-of-Dimensionality

Data Mining and Knowledge Discovery
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem

Data Mining and Knowledge Discovery
Learning domain-independent string transformation weights for high accuracy object identification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Two supervised learning approaches for name disambiguation in author citations

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Concept discovery from text

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Adaptive Name Matching in Information Integration

IEEE Intelligent Systems

Citation data clustering for author name disambiguation

Proceedings of the 2nd international conference on Scalable information systems
MyCites: An Intelligent Information System for Maintaining Citations

SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
Author Name Disambiguation for Citations Using Topic and Web Correlation

ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
On co-authorship for author disambiguation

Information Processing and Management: an International Journal
Generic Entity Resolution in Relational Databases

ADBIS '09 Proceedings of the 13th East European Conference on Advances in Databases and Information Systems
Person cross document coreference with name perplexity estimates

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Effective self-training author name disambiguation in scholarly digital libraries

Proceedings of the 10th annual joint conference on Digital libraries
Dynamic parameters for cross document coreferece

COLING '10 Proceedings of the 23rd International Conference on Computational Linguistics: Posters
A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments

Journal of the American Society for Information Science and Technology
Automatic annotation of bibliographical references in digital humanities books, articles and blogs

Proceedings of the 4th ACM workshop on Online books, complementary social media and crowdsourcing
Disambiguating authors in citations on the web and authorship correlations

Expert Systems with Applications: An International Journal
Cost-effective on-demand associative author name disambiguation

Information Processing and Management: an International Journal
A tool for generating synthetic authorship records for evaluating author name disambiguation methods

Information Sciences: an International Journal
Active associative sampling for author name disambiguation

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Citation-based bootstrapping for large-scale author disambiguation

Journal of the American Society for Information Science and Technology
Flexible and efficient distributed resolution of large entities

FoIKS'12 Proceedings of the 7th international conference on Foundations of Information and Knowledge Systems
A brief survey of automatic methods for author name disambiguation

ACM SIGMOD Record
Ambiguous author query detection using crowdsourced digital library annotations

Information Processing and Management: an International Journal
An automatic system for identifying authorities in digital libraries

Expert Systems with Applications: An International Journal
A relevance feedback approach for the author name disambiguation problem

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries

Quantified Score

Hi-index	0.00

Visualization

Abstract

Because of name variations, an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, web search, database integration, and may cause improper attribution to authors. This paper presents a hierarchical naive Bayes mixture model, an unsupervised learning approach, for name disambiguation in author citations. This method partitions a collection of citations1 into clusters, with each cluster containing only citations authored by the same author, thus disambiguating authorship in citations to induce author name identities. Three types of citation features are used: co-author names, paper title words, and journal or proceeding title words. The approach is illustrated with 16 name datasets that are constructed based on the publication lists collected from author homepages and DBLP computer science bibliography.