Dense subgraphs with restrictions and applications to gene annotation graphs

  • Authors:
  • Barna Saha;Allison Hoch;Samir Khuller;Louiqa Raschid;Xiao-Ning Zhang

  • Affiliations:
  • Research supported by NSF Award CCF-0728839 Department of Computer Science, University of Maryland, College Park, MD;Research supported by NSF REU Supplement to Award CCF-0728839 Department of Computer Science, University of Maryland, College Park, MD;Research supported by NSF Award CCF-0728839 and a Google Research Award Department of Computer Science and UMIACS, University of Maryland, College Park, MD;Research supported by NSF Award IIS-0430915 and IIS-0960963 UMIACS and Robert H Smith School of Business, University of Maryland, College Park, MD;Research supported by Department of Biology, St Bonaventure University, St Bonaventure, NY

  • Venue:
  • RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we focus on finding complex annotation patterns representing novel and interesting hypotheses from gene annotation data We define a generalization of the densest subgraph problem by adding an additional distance restriction (defined by a separate metric) to the nodes of the subgraph We show that while this generalization makes the problem NP-hard for arbitrary metrics, when the metric comes from the distance metric of a tree, or an interval graph, the problem can be solved optimally in polynomial time We also show that the densest subgraph problem with a specified subset of vertices that have to be included in the solution can be solved optimally in polynomial time In addition, we consider other extensions when not just one solution needs to be found, but we wish to list all subgraphs of almost maximum density as well We apply this method to a dataset of genes and their annotations obtained from The Arabidopsis Information Resource (TAIR) A user evaluation confirms that the patterns found in the distance restricted densest subgraph for a dataset of photomorphogenesis genes are indeed validated in the literature; a control dataset validates that these are not random patterns Interestingly, the complex annotation patterns potentially lead to new and as yet unknown hypotheses We perform experiments to determine the properties of the dense subgraphs, as we vary parameters, including the number of genes and the distance.