Ambiguous author query detection using crowdsourced digital library annotations

  • Authors:
  • Xiaoling Sun;Jasleen Kaur;Lino Possamai;Filippo Menczer

  • Affiliations:
  • Dept. of Computer Science and Technology, Dalian University of Technology, China and School of Informatics and Computing, Indiana University, Bloomington, USA;School of Informatics and Computing, Indiana University, Bloomington, USA;Department of Pure and Applied Mathematics, University of Padua, Italy and School of Informatics and Computing, Indiana University, Bloomington, USA;School of Informatics and Computing, Indiana University, Bloomington, USA

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The name ambiguity problem is especially challenging in the field of bibliographic digital libraries. The problem is amplified when names are collected from heterogeneous sources. This is the case in the Scholarometer system, which performs bibliometric analysis by cross-correlating author names in user queries with those retrieved from digital libraries. The uncontrolled nature of user-generated annotations is very valuable, but creates the need to detect ambiguous names. Our goal is to detect ambiguous names at query time by mining digital library annotation data, thereby decreasing noise in the bibliometric analysis. We explore three kinds of heuristic features based on citations, metadata, and crowdsourced topics in a supervised learning framework. The proposed approach achieves almost 80% accuracy. Finally, we compare the performance of ambiguous author detection in Scholarometer using Google Scholar against a baseline based on Microsoft Academic Search.