Approximation algorithms for speeding up dynamic programming and denoising aCGH data

  • Authors:
  • Charalampos E. Tsourakakis;Richard Peng;Maria A. Tsiarli;Gary L. Miller;Russell Schwartz

  • Affiliations:
  • Carnegie Mellon University;Carnegie Mellon University;University of Pittsburgh;Carnegie Mellon University;Carnegie Mellon University

  • Venue:
  • Journal of Experimental Algorithmics (JEA)
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The development of cancer is largely driven by the gain or loss of subsets of the genome, promoting uncontrolled growth or disabling defenses against it. Denoising array-based Comparative Genome Hybridization (aCGH) data is an important computational problem central to understanding cancer evolution. In this article, we propose a new formulation of the denoising problem that we solve with a “vanilla” dynamic programming algorithm, which runs in O(n2) units of time. Then, we propose two approximation techniques. Our first algorithm reduces the problem into a well-studied geometric problem, namely halfspace emptiness queries, and provides an ε additive approximation to the optimal objective value in Õ(n&frac43;+δ log (&fracU;ε)) time, where δ is an arbitrarily small positive constant and U = max{&sqrtC;,(|Pi|) i=1,…,n} (P=(P1, P2, …, Pn), Pi ∈ ℝ, is the vector of the noisy aCGH measurements, C a normalization constant). The second algorithm provides a (1 ± ε) approximation (multiplicative error) and runs in O(n log n/ε) time. The algorithm decomposes the initial problem into a small (logarithmic) number of Monge optimization subproblems that we can solve in linear time using existing techniques. Finally, we validate our model on synthetic and real cancer datasets. Our method consistently achieves superior precision and recall to leading competitors on the data with ground truth. In addition, it finds several novel markers not recorded in the benchmarks but supported in the oncology literature.