Conservative extensions of linkage disequilibrium measures from pairwise to multi-loci and algorithms for optimal tagging SNP selection

Authors:
Ryan Tarpine;Fumei Lam;Sorin Istrail
Affiliations:
Center for Computational Molecular Biology, Department of Computer Science, Brown University, Providence, RI;Department of Computer Science, University of California, Davis, CA;Center for Computational Molecular Biology, Department of Computer Science, Brown University, Providence, RI
Venue:
RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
Year:
2011

Citing 4
Cited 0

Haplotypes and informative SNP selection algorithms: don't block out information

RECOMB '03 Proceedings of the seventh annual international conference on Research in computational molecular biology
Methods for Inferring Block-Wise Ancestral History from Haploid Sequences

WABI '02 Proceedings of the Second International Workshop on Algorithms in Bioinformatics
SNPs Problems, Complexity, and Algorithms

ESA '01 Proceedings of the 9th Annual European Symposium on Algorithms
Combinatorial problems arising in SNP and haplotype analysis

DMTCS'03 Proceedings of the 4th international conference on Discrete mathematics and theoretical computer science

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present results on two classes of problems. The first result addresses the long standing open problem of finding unifying principles for Linkage Disequilibrium (LD) measures in population genetics (Lewontin 1964 [10], Hedrick 1987 [8], Devlin and Risch 1995 [5]). Two desirable properties have been proposed in the extensive literature on this topic and the mutual consistency between these properties has remained at the heart of statistical and algorithmic difficulties with haplotype and genome-wide association study analysis. The first axiom is (1) The ability to extend LD measures to multiple loci as a conservative extension of pairwise LD. All widely used LD measures are pairwise measures. Despite significant attempts, it is not clear how to naturally extend these measures to multiple loci, leading to a "curse of the pairwise". The second axiom is (2) The Interpretability of Intermediate Values. In this paper, we resolve this mutual consistency problem by introducing a new LD measure, directed informativeness I (the directed graph theoretic counterpart of the informativeness measure introduced by Halldorsson et al. [6]) and show that it satisfies both of the above axioms. We also show the maximum informative subset of tagging SNPs based on I can be computed exactly in polynomial time for realistic genome-wide data. Furthermore, we present polynomial time algorithms for optimal genome-wide tagging SNPs selection for a number of commonly used LD measures, under the bounded neighborhood assumption for linked pairs of SNPs. One problem in the area is the search for a quality measure for tagging SNPs selection that unifies the LD-based methods such as LD-select (implemented in Tagger, de Bakker et al. 2005 [4], Carlson et al. 2004 [3]) and the information-theoretic ones such as informativeness. We show that the objective function of the LD-select algorithm is the Minimal Dominating Set (MDS) on r2-SNP graphs and show that we can compute MDS in polynomial time for this class of graphs. Although in LD-select the "maximally informative" solution is obtained through a greedy algorithm, and therefore better referred to as "locally maximally informative," we show that in fact, Tagger (LD-select) performs very close to the global maximally informative optimum.