Computer models for identifying instrumental citations in the biomedical literature

Authors:
Lawrence D. Fu;Yindalon Aphinyanaphongs;Constantin F. Aliferis
Affiliations:
Department of Medicine, Center for Health Informatics and Bioinformatics, New York University Medical Center, New York, USA 10016;Department of Medicine, Center for Health Informatics and Bioinformatics, New York University Medical Center, New York, USA 10016;Department of Pathology, Center for Health Informatics and Bioinformatics, New York University Medical Center, New York, USA 10016
Venue:
Scientometrics
Year:
2013

Citing 4
Cited 0

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Text Categorization with Support Vector Machines. How to Represent Texts in Input Space?

Machine Learning
Automatic classification of citation function

EMNLP '06 Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

The most popular method for evaluating the quality of a scientific publication is citation count. This metric assumes that a citation is a positive indicator of the quality of the cited work. This assumption is not always true since citations serve many purposes. As a result, citation count is an indirect and imprecise measure of impact. If instrumental citations could be reliably distinguished from non-instrumental ones, this would readily improve the performance of existing citation-based metrics by excluding the non-instrumental citations. A citation was operationally defined as instrumental if either of the following was true: the hypothesis of the citing work was motivated by the cited work, or the citing work could not have been executed without the cited work. This work investigated the feasibility of developing computer models for automatically classifying citations as instrumental or non-instrumental. Instrumental citations were manually labeled, and machine learning models were trained on a combination of content and bibliometric features. The experimental results indicate that models based on content and bibliometric features are able to automatically classify instrumental citations with high predictivity (AUC = 0.86). Additional experiments using independent hold out data and prospective validation show that the models are generalizeable and can handle unseen cases. This work demonstrates that it is feasible to train computer models to automatically identify instrumental citations.