Computer models for identifying instrumental citations in the biomedical literature

  • Authors:
  • Lawrence D. Fu;Yindalon Aphinyanaphongs;Constantin F. Aliferis

  • Affiliations:
  • Department of Medicine, Center for Health Informatics and Bioinformatics, New York University Medical Center, New York, USA 10016;Department of Medicine, Center for Health Informatics and Bioinformatics, New York University Medical Center, New York, USA 10016;Department of Pathology, Center for Health Informatics and Bioinformatics, New York University Medical Center, New York, USA 10016

  • Venue:
  • Scientometrics
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The most popular method for evaluating the quality of a scientific publication is citation count. This metric assumes that a citation is a positive indicator of the quality of the cited work. This assumption is not always true since citations serve many purposes. As a result, citation count is an indirect and imprecise measure of impact. If instrumental citations could be reliably distinguished from non-instrumental ones, this would readily improve the performance of existing citation-based metrics by excluding the non-instrumental citations. A citation was operationally defined as instrumental if either of the following was true: the hypothesis of the citing work was motivated by the cited work, or the citing work could not have been executed without the cited work. This work investigated the feasibility of developing computer models for automatically classifying citations as instrumental or non-instrumental. Instrumental citations were manually labeled, and machine learning models were trained on a combination of content and bibliometric features. The experimental results indicate that models based on content and bibliometric features are able to automatically classify instrumental citations with high predictivity (AUC = 0.86). Additional experiments using independent hold out data and prospective validation show that the models are generalizeable and can handle unseen cases. This work demonstrates that it is feasible to train computer models to automatically identify instrumental citations.