Information Extraction as Link Prediction: Using Curated Citation Networks to Improve Gene Detection

  • Authors:
  • Andrew Arnold;William W. Cohen

  • Affiliations:
  • Machine Learning Department, Carnegie Mellon University,;Machine Learning Department, Carnegie Mellon University,

  • Venue:
  • WASA '09 Proceedings of the 4th International Conference on Wireless Algorithms, Systems, and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we explore the usefulness of various types of publication-related metadata, such as citation networks and curated databases, for the task of identifying genes in academic biomedical publications. Specifically, we examine whether knowing something about which genes an author has previously written about, combined with information about previous coauthors and citations, can help us predict which new genes the author is likely to write about in the future. Framed in this way, the problem becomes one of predicting links between authors and genes in the publication network. We show that this solely social-network based link prediction technique outperforms various baselines, including those relying only on non-social biological information.