Controlling size when aligning multiple genomic sequences with duplications

  • Authors:
  • Minmei Hou;Piotr Berman;Louxin Zhang;Webb Miller

  • Affiliations:
  • Department of Computer Science and Engineering, Penn State, University Park, PA;Department of Computer Science and Engineering, Penn State, University Park, PA;Department of Mathematics, National University of Singapore, Singapore;Department of Computer Science and Engineering, Penn State, University Park, PA

  • Venue:
  • WABI'06 Proceedings of the 6th international conference on Algorithms in Bioinformatics
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

For a genomic region containing a tandem gene cluster, a proper set of alignments needs to align only orthologous segments, i.e., those separated by a speciation event. Otherwise, methods for finding regions under evolutionary selection will not perform properly. Conversely, the alignments should indicate every orthologous pair of genes or genomic segments. Attaining this goal in practice requires a technique for avoiding a combinatorial explosion in the number of local alignments. To better understand this process, we model it as a graph problem of finding a minimum cardinality set of cliques that contain all edges. We provide an upper bound for an important class of graphs (the problem is NP-hard and very difficult to approximate in the general case), and use the bound and computer simulations to evaluate two heuristic solutions. An implementation of one of them is evaluated on mammalian sequences from the α-globin gene cluster.