Inferring pathways from gene lists using a literature-derived network of biological relationships

  • Authors:
  • Dilip Rajagopalan;Pankaj Agarwal

  • Affiliations:
  • Bioinformatics Sciences, GlaxoSmithKline Pharmaceuticals R&D 709 Swedeland Road, UW2230, King of Prussia, PA 19406-0939, USA;Bioinformatics Sciences, GlaxoSmithKline Pharmaceuticals R&D 709 Swedeland Road, UW2230, King of Prussia, PA 19406-0939, USA

  • Venue:
  • Bioinformatics
  • Year:
  • 2005

Quantified Score

Hi-index 3.84

Visualization

Abstract

Motivation: A number of omic technologies such as transcriptional profiling, proteomics, literature searches, genetic association, etc. help in the identification of sets of important genes. A subset of these genes may act in a coordinated manner, possibly because they are part of the same biological pathway. Interpreting such gene lists and relating them to pathways is a challenging task. Databases of biological relationships between thousands of mammalian genes can help in deciphering omics data. The relationships between genes can be assembled into a biological network with each protein as a node and each relationship as an edge between two proteins (or nodes). This network may then be searched for subnetworks consisting largely of interesting genes from the omics experiment. The subset of genes in the subnetwork along with the web of relationships between them helps to decipher the underlying pathways. Finding such subnetworks that maximally include all proteins from the query set but few others is the focus for this paper. Results: We present a heuristic algorithm and a scoring function that work well both on simulated data and on data from known pathways. The scoring function is an extension of a previous study for a single biological experiment. We use a simple set of heuristics that provide a more efficient solution than the simulated annealing method. We find that our method works on reasonably complex curated networks containing ∼9000 biological entities (genes and metabolites), and ∼30 000 biological relationships. We also show that our method can pick up a pathway signal from a query list including a moderate number of genes unrelated to the pathway. In addition, we quantify the sensitivity and specificity of the technique. Contact: dilip_rajagopalan@gsk.com