A rigorous analysis of population stratification with limited data
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Separating populations with wide data: a spectral analysis
ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation
Hi-index | 0.00 |
In this thesis, we study three problems: routing, edge disjoint paths, and classification. In hierarchical routing, we obtain "near-optimal" routing table size and path stretch through a randomized hierarchical decomposition scheme in the metric space induced by a graph. We say that a metric (X, d) has doubling dimension dim(X) at most α if every set of diameter D can be covered by 2α sets of diameter D/2. (A doubling metric is one whose doubling dimension dim(X) is a constant.) For a connected graph G, whose shortest path distances dG induce the doubling metric (X, dG), we show how to perform (1 + τ)-stretch routing on G for any 0 O(α) logΔlogδ bits with only (α/τ)O(α) logΔ entries, where Δ is the diameter of G and δ is the maximum degree of G. The Edge Disjoint Paths (EDP) problem in undirected graphs refers to the following: Given a graph G with n nodes and a set T of pairs of terminals, connect as many terminal pairs as possible using paths that are mutually edge disjoint. This leads to a variety of classic NP-complete problems, for which approximability is not well understood. We show a polylogarithmic approximation algorithm for the undirected EDP problem in general graphs with a moderate restriction on graph connectivity: we require the global minimum cut of G to be Ω(log5 n). Our algorithm extends previous techniques in that it applies to graphs with high diameters and asymptotically large minors. In the classification problem, we are given a set of 2N diploid individuals from population P1 and P2 (with no admixture), and a small amount of multilocus genotype data from the same set of K loci for all 2N individuals, and we aim to partition P 1 and P2 perfectly. In our model, given the population of origin of each individual, the genotypes are assumed to be generated by drawing alleles independently at random across the K loci, each form its own distribution. We show several results for this problem.