Author name disambiguation for ranking and clustering pubmed data using netclus

  • Authors:
  • Arvin Varadharajalu;Wei Liu;Wilson Wong

  • Affiliations:
  • School of Computer Science and Software Engineering, The University of Western Australia, Australia;School of Computer Science and Software Engineering, The University of Western Australia, Australia;School of Computer Science and Information Technology, RMIT University, Australia

  • Venue:
  • AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The ranking and clustering of publication databases are often used to discover useful information about research areas. NetClus is an iterative algorithm for clustering heterogenous star-schema information network that incorporates the ranking information of individual data types. The algorithm has been evaluated using the DBLP database. In this paper, we apply NetClus on PubMed, a free database of articles on life sciences and biomedical topics to discover key aspects of cancer research. The absence of unique identifiers for authors in PubMed introduces additional challenges. To address this, we introduce an improved author disambiguation technique using affiliation string normalisation based on vector space model together with co-author networks. Our technique for disambiguating authors, which offers a higher accuracy than existing techniques, significantly improves NetClus clustering results.