Improved recombination lower bounds for haplotype data

  • Authors:
  • Vineet Bafna;Vikas Bansal

  • Affiliations:
  • Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA;Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA

  • Venue:
  • RECOMB'05 Proceedings of the 9th Annual international conference on Research in Computational Molecular Biology
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Recombination is an important evolutionary mechanism responsible for the genetic diversity in humans and other organisms. Recently, there has been extensive research on understanding the fine scale variation in recombination rates across the human genome using DNA polymorphism data. A combinatorial approach toward this is to estimate the minimum number of recombination events in any history of the sample. Recently, Myers and Griffiths [1] proposed two measures, Rh and Rs, that give lower bounds on the minimum number of recombination events. In this paper, we provide new and improved methods (both in terms of running time and ability to detect past recombination events) for computing recombination lower bounds. Our principal results include:We show that computing the lower bound Rh is NP-hard and adapt the greedy algorithm for the set cover problem [2] to obtain a polynomial time algorithm for computing a diversity based bound Rg. This algorithm is several orders of magnitude faster than the Recmin program [1] and the bound Rg matches the bound Rh almost always. We also show that computing the lower bound is also NP-hard using a reduction from MAX-2SAT. We give a O(m 2n) time algorithm for computing Rs for a dataset with n haplotypes and m SNP's. We propose a new bound RI which extends the history based bound Rs using the notion of intermediate haplotypes. This bound detects more recombination events than both Rh and Rs bounds on many real datasets. We extend our algorithms for computing Rg and Rs to obtain lower bounds for haplotypes with missing data. These methods can detect more recombination events for the LPL dataset [3] than previous bounds and provide stronger evidence for the presence of a recombination hotspot. We apply our lower bounds to a real dataset [4] and demonstrate that these can provide a good indication for the presence and the location of recombination hotspots.