Improved recombination lower bounds for haplotype data

Authors:
Vineet Bafna;Vikas Bansal
Affiliations:
Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA;Department of Computer Science and Engineering, University of California at San Diego, La Jolla, CA
Venue:
RECOMB'05 Proceedings of the 9th Annual international conference on Research in Computational Molecular Biology
Year:
2005

Citing 5
Cited 8

Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
Efficient Reconstruction of Phylogenetic Networks with Constrained Recombination

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
The Number of Recombination Events in a Sample History: Conflict Graph and Lower Bounds

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Haplotype reconstruction from genotype data using Imperfect Phylogeny

Bioinformatics
Approximation algorithms for combinatorial problems

Journal of Computer and System Sciences

An efficiently computed lower bound on the number of recombinations in phylogenetic networks: Theory and empirical study

Discrete Applied Mathematics
Parsimony Score of Phylogenetic Networks: Hardness Results and a Linear-Time Heuristic

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A new linear-time heuristic algorithm for computing the parsimony score of phylogenetic networks: theoretical bounds and empirical performance

ISBRA'07 Proceedings of the 3rd international conference on Bioinformatics research and applications
Accurate computation of likelihoods in the coalescent with recombination via parsimony

RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology
Generalizing the four gamete condition and splits equivalence theorem: perfect phylogeny on three state characters

WABI'09 Proceedings of the 9th international conference on Algorithms in bioinformatics
Minimum recombination histories by branch and bound

WABI'05 Proceedings of the 5th International conference on Algorithms in Bioinformatics
Algorithms to distinguish the role of gene-conversion from single-crossover recombination in the derivation of SNP sequences in populations

RECOMB'06 Proceedings of the 10th annual international conference on Research in Computational Molecular Biology
Integer programming formulations and computations solving phylogenetic and population genetic problems with missing or genotypic data

COCOON'07 Proceedings of the 13th annual international conference on Computing and Combinatorics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recombination is an important evolutionary mechanism responsible for the genetic diversity in humans and other organisms. Recently, there has been extensive research on understanding the fine scale variation in recombination rates across the human genome using DNA polymorphism data. A combinatorial approach toward this is to estimate the minimum number of recombination events in any history of the sample. Recently, Myers and Griffiths [1] proposed two measures, Rh and Rs, that give lower bounds on the minimum number of recombination events. In this paper, we provide new and improved methods (both in terms of running time and ability to detect past recombination events) for computing recombination lower bounds. Our principal results include:We show that computing the lower bound Rh is NP-hard and adapt the greedy algorithm for the set cover problem [2] to obtain a polynomial time algorithm for computing a diversity based bound Rg. This algorithm is several orders of magnitude faster than the Recmin program [1] and the bound Rg matches the bound Rh almost always. We also show that computing the lower bound is also NP-hard using a reduction from MAX-2SAT. We give a O(m 2n) time algorithm for computing Rs for a dataset with n haplotypes and m SNP's. We propose a new bound RI which extends the history based bound Rs using the notion of intermediate haplotypes. This bound detects more recombination events than both Rh and Rs bounds on many real datasets. We extend our algorithms for computing Rg and Rs to obtain lower bounds for haplotypes with missing data. These methods can detect more recombination events for the LPL dataset [3] than previous bounds and provide stronger evidence for the presence of a recombination hotspot. We apply our lower bounds to a real dataset [4] and demonstrate that these can provide a good indication for the presence and the location of recombination hotspots.