Two supervised learning approaches for name disambiguation in author citations

Authors:
Hui Han;Lee Giles;Hongyuan Zha;Cheng Li;Kostas Tsioutsiouliklis
Affiliations:
The Pennsylvania State University, University Park, PA;The Pennsylvania State University, University Park, PA;The Pennsylvania State University, University Park, PA;Harvard School of Public Health, Boston, MA;NEC Laboratories America, Princeton, NJ
Venue:
Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Year:
2004

Citing 30
Cited 82

Tracking and data association

Tracking and data association
Viewing morphology as an inference process

SIGIR '93 Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval
The nature of statistical learning theory

The nature of statistical learning theory
CiteSeer: an automatic citation indexing system

Proceedings of the third ACM conference on Digital libraries
Distributional clustering of words for text classification

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Relational learning of pattern-match rules for information extraction

AAAI '99/IAAI '99 Proceedings of the sixteenth national conference on Artificial intelligence and the eleventh Innovative applications of artificial intelligence conference innovative applications of artificial intelligence
Duplicate record elimination in large data files

ACM Transactions on Database Systems (TODS)
An introduction to support Vector Machines: and other kernel-based learning methods

An introduction to support Vector Machines: and other kernel-based learning methods
Efficient clustering of high-dimensional data sets with application to reference matching

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Hardening soft information sources

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
IntelliClean: a knowledge-based intelligent data cleaner

Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining
Automated name authority control

Proceedings of the 1st ACM/IEEE-CS joint conference on Digital libraries
A statistical learning learning model of text classification for support vector machines

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Bipartite graph partitioning and data clustering

Proceedings of the tenth international conference on Information and knowledge management
Real-world Data is Dirty: Data Cleansing and The Merge/Purge Problem

Data Mining and Knowledge Discovery
An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Learning domain-independent string transformation weights for high accuracy object identification

Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
Automatic document metadata extraction using support vector machines

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
Bibliographic attribute extraction from erroneous references based on a statistical model

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
eBizSearch: an OAI-compliant digital library for eBusiness

Proceedings of the 3rd ACM/IEEE-CS joint conference on Digital libraries
A divisive information theoretic feature clustering algorithm for text classification

The Journal of Machine Learning Research
Generative model-based clustering of directional data

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Distributional clustering of English words

ACL '93 Proceedings of the 31st annual meeting on Association for Computational Linguistics
Similarity-based estimation of word cooccurrence probabilities

ACL '94 Proceedings of the 32nd annual meeting on Association for Computational Linguistics
Concept discovery from text

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Use of support vector machines in extended named entity recognition

COLING-02 proceedings of the 6th conference on Natural language learning - Volume 20
Training a naive bayes classifier via the EM algorithm with a class distribution constraint

CONLL '03 Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003 - Volume 4
Adaptive Name Matching in Information Integration

IEEE Intelligent Systems
Hierarchical hidden Markov models for information extraction

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Probabilistic latent semantic analysis

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Disambiguating Web appearances of people in a social network

WWW '05 Proceedings of the 14th international conference on World Wide Web
Name disambiguation in author citations using a K-way spectral clustering method

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Comparative study of name disambiguation problem using a scalable blocking-based framework

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
A hierarchical naive Bayes mixture model for name disambiguation in author citations

Proceedings of the 2005 ACM symposium on Applied computing
Effective and scalable solutions for mixed and split citation problems in digital libraries

Proceedings of the 2nd international workshop on Information quality in information systems
Person resolution in person search results: WebHawk

Proceedings of the 14th ACM international conference on Information and knowledge management
Email alias detection using social network analysis

Proceedings of the 3rd international workshop on Link discovery
Also by the same author: AKTiveAuthor, a citation graph approach to name disambiguation

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Tagging of name records for genealogical data browsing

Proceedings of the 6th ACM/IEEE-CS joint conference on Digital libraries
Topic evolution and social interactions: how authors effect research

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Reference metadata extraction using a hierarchical knowledge representation framework

Decision Support Systems
Using a knowledge base to disambiguate personal name in web search results

Proceedings of the 2007 ACM symposium on Applied computing
Web Appearance Disambiguation of Personal Names Based on Network Motif

WI '06 Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence
Adaptive graphical approach to entity resolution

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Efficient topic-based unsupervised name disambiguation

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Web based linkage

Proceedings of the 9th annual ACM international workshop on Web information and data management
A constraint-based probabilistic framework for name disambiguation

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
OnCU system: ontology-based category utility approach for author name disambiguation

Proceedings of the 2nd international conference on Ubiquitous information management and communication
Citation data clustering for author name disambiguation

Proceedings of the 2nd international conference on Scalable information systems
Improving the performance of personal name disambiguation using web directories

Information Processing and Management: an International Journal
Identification of time-varying objects on the web

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
Using web information for creating publication venue authority files

Proceedings of the 8th ACM/IEEE-CS joint conference on Digital libraries
ArnetMiner: extraction and mining of academic social networks

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Keeping a digital library clean: new solutions to old problems

Proceedings of the eighth ACM symposium on Document engineering
IdentityRank: Named Entity Disambiguation in the Context of the NEWS Project

ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
MyCites: An Intelligent Information System for Maintaining Citations

SETN '08 Proceedings of the 5th Hellenic conference on Artificial Intelligence: Theories, Models and Applications
Author Name Disambiguation for Citations Using Topic and Web Correlation

ECDL '08 Proceedings of the 12th European conference on Research and Advanced Technology for Digital Libraries
A Graph Partitioning Approach to Entity Disambiguation Using Uncertain Information

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
GHOST: an effective graph-based framework for name distinction

Proceedings of the 17th ACM conference on Information and knowledge management
On co-authorship for author disambiguation

Information Processing and Management: an International Journal
A Term-Based Driven Clustering Approach for Name Disambiguation

APWeb/WAIM '09 Proceedings of the Joint International Conferences on Advances in Data and Web Management
Author name disambiguation in MEDLINE

ACM Transactions on Knowledge Discovery from Data (TKDD)
Disambiguating authors in academic publications using random forests

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Using web information for author name disambiguation

Proceedings of the 9th ACM/IEEE-CS joint conference on Digital libraries
Bridging the Gap between Linked Data and the Semantic Desktop

ISWC '09 Proceedings of the 8th International Semantic Web Conference
A cascaded classification approach to disambiguating polysemous mentions with social chains

Expert Systems with Applications: An International Journal
A knowledge-based approach to named entity disambiguation in news articles

AI'07 Proceedings of the 20th Australian joint conference on Advances in artificial intelligence
SyGAR: a synthetic data generator for evaluating name disambiguation methods

ECDL'09 Proceedings of the 13th European conference on Research and advanced technology for digital libraries
Effective self-training author name disambiguation in scholarly digital libraries

Proceedings of the 10th annual joint conference on Digital libraries
A Combination Approach to Web User Profiling

ACM Transactions on Knowledge Discovery from Data (TKDD)
K-radius subgraph comparison for RDF data cleansing

WAIM'10 Proceedings of the 11th international conference on Web-age information management
On Graph-Based Name Disambiguation

Journal of Data and Information Quality (JDIQ)
Author name disambiguation for citations on the deep web

WAIM'10 Proceedings of the 2010 international conference on Web-age information management
A heuristic approach to author name disambiguation in bibliometrics databases for large-scale research assessments

Journal of the American Society for Information Science and Technology
An effective web document clustering algorithm based on bisection and merge

Artificial Intelligence Review
Eliminating the redundancy in blocking-based entity resolution methods

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Resolving author name homonymy to improve resolution of structures in co-author networks

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
A research agenda for data curation cyberinfrastructure

Proceedings of the 11th annual international ACM/IEEE joint conference on Digital libraries
Efficient name disambiguation in digital libraries

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Incorporating user feedback into name disambiguation of scientific cooperation network

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Did they notice? - a case-study on the community contribution to data quality in DBLP

TPDL'11 Proceedings of the 15th international conference on Theory and practice of digital libraries: research and advanced technology for digital libraries
Author Name Disambiguation in Citations

WI-IAT '11 Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
Frequency-aware similarity measures: why Arnold Schwarzenegger is always a duplicate

Proceedings of the 20th ACM international conference on Information and knowledge management
Combining machine learning and human judgment in author disambiguation

Proceedings of the 20th ACM international conference on Information and knowledge management
Authormagic: an approach to author disambiguation in large-scale digital libraries

Proceedings of the 20th ACM international conference on Information and knowledge management
Efficient name disambiguation for large-scale databases

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Ontology-driven automatic entity disambiguation in unstructured text

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Mining information for instance unification

ISWC'06 Proceedings of the 5th international conference on The Semantic Web
Name discrimination by clustering similar contexts

CICLing'05 Proceedings of the 6th international conference on Computational Linguistics and Intelligent Text Processing
Analysing social networks within bibliographical data

DEXA'06 Proceedings of the 17th international conference on Database and Expert Systems Applications
Disambiguating authors in citations on the web and authorship correlations

Expert Systems with Applications: An International Journal
Cost-effective on-demand associative author name disambiguation

Information Processing and Management: an International Journal
A tool for generating synthetic authorship records for evaluating author name disambiguation methods

Information Sciences: an International Journal
Active associative sampling for author name disambiguation

Proceedings of the 12th ACM/IEEE-CS joint conference on Digital Libraries
Citation-based bootstrapping for large-scale author disambiguation

Journal of the American Society for Information Science and Technology
A brief survey of automatic methods for author name disambiguation

ACM SIGMOD Record
Author name disambiguation using a new categorical distribution similarity

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Characteristics of Korean personal names

Journal of the American Society for Information Science and Technology
Ambiguous author query detection using crowdsourced digital library annotations

Information Processing and Management: an International Journal
Domain-Independent Entity Coreference for Linking Ontology Instances

Journal of Data and Information Quality (JDIQ) - Special Issue on Entity Resolution
Evidence of community structure in Biomedical Research Grant Collaborations

Journal of Biomedical Informatics
Vietnamese author name disambiguation for integrating publications from heterogeneous sources

ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part I
An automatic system for identifying authorities in digital libraries

Expert Systems with Applications: An International Journal
Online search of overlapping communities

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
A relevance feedback approach for the author name disambiguation problem

Proceedings of the 13th ACM/IEEE-CS joint conference on Digital libraries
Author disambiguation by hierarchical agglomerative clustering with adaptive stopping criterion

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval
Resolving homonymy with correlation clustering in scholarly digital libraries

Proceedings of the 22nd international conference on World Wide Web companion
Do We Need Entity-Centric Knowledge Bases for Entity Disambiguation?

Proceedings of the 13th International Conference on Knowledge Management and Knowledge Technologies
Bootstrapping active name disambiguation with crowdsourcing

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
A semi-supervised approach for author disambiguation in KDD CUP 2013

Proceedings of the 2013 KDD Cup 2013 Workshop
Robust hybrid name disambiguation framework for large databases

Scientometrics
Name disambiguation in scientific cooperation network by exploiting user feedback

Artificial Intelligence Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Due to name abbreviations, identical names, name misspellings, and pseudonyms inpublications or bibliographies (citations), an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, web search, database integration, and may cause improper attribution to authors. This paper investigates two supervised learning approaches to disambiguate authors in the citations. One approach uses the naive Bayes probability model, a generative model; the other uses Support Vector Machines(SVMs) and the vector space representation of citations, a discriminative model. Both approaches utilize three types of citation attributes: co-author names, the title of the paper, and the title of the journal or proceeding. We illustrate these two approaches on two types of data, one collected from the web, mainly publication lists from homepages, the other collected from the DBLPcitation databases.