Resolving homonymy with correlation clustering in scholarly digital libraries

Authors:
Jeongin Ju;Hosung Park;Sue Moon
Affiliations:
KAIST, Daejeon, South Korea;KAIST, Daejeon, South Korea;KAIST, Daejeon, South Korea
Venue:
Proceedings of the 22nd international conference on World Wide Web companion
Year:
2013

Citing 10
Cited 0

Two supervised learning approaches for name disambiguation in author citations

Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries
Computing Clusters of Correlation Connected objects

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Name disambiguation in author citations using a K-way spectral clustering method

Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries
Efficient topic-based unsupervised name disambiguation

Proceedings of the 7th ACM/IEEE-CS joint conference on Digital libraries
Social Network Extraction of Academic Researchers

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
On co-authorship for author disambiguation

Information Processing and Management: an International Journal
Named entity disambiguation by leveraging wikipedia semantic knowledge

Proceedings of the 18th ACM conference on Information and knowledge management
Web personal name disambiguation based on reference entity tables mined from the web

Proceedings of the eleventh international workshop on Web information and data management
Effective self-training author name disambiguation in scholarly digital libraries

Proceedings of the 10th annual joint conference on Digital libraries
Author name disambiguation

Annual Review of Information Science and Technology

Quantified Score

Hi-index	0.00

Visualization

Abstract

As scholarly data increases rapidly, scholarly digital libraries, supplying publication data through convenient online interfaces, become popular and important tools for researchers. Researchers use SDLs for various purposes, including searching the publications of an author, assessing one's impact by the citations, and identifying one's research topics. However, common names among authors cause difficulties in correctly identifying one's works among a large number of scholarly publications. Abbreviated first and middle names make it even harder to identify and distinguish authors with the same representation (i.e. spelling) of names. Several disambiguation methods have solved the problem under their own assumptions. The assumptions are usually that inputs such as the number of same-named authors, training sets, or rich and clear information about papers are given. Considering the size of scholarship records today and their inconsistent formats, we expect their assumptions be very hard to be met. We use common assumption that coauthors are likely to write more than one paper together and propose an unsupervised approach to group papers from the same author only using the most common information, author lists. We represent each paper as a point in an author name space, take dimension reduction to find author names shown frequently together in papers, and cluster papers with vector similarity measure well fitted for name disambiguation task. The main advantage of our approach is to use only coauthor information as input. We evaluate our method using publication records collected from DBLP, and show that our approach results in better disambiguation compared to other five clustering methods in terms of cluster purity and fragmentation.