Efficient calculation of interval scores for DNA copy number data analysis

Authors:
Doron Lipson;Yonatan Aumann;Amir Ben-Dor;Nathan Linial;Zohar Yakhini
Affiliations:
Computer Science Dept., Technion, Haifa;Computer Science Dept., Bar-Ilan University, Ramat Gan;Agilent Laboratories;Computer Science Dept., Hebrew University of Jerusalem;Computer Science Dept., Technion, Haifa
Venue:
RECOMB'05 Proceedings of the 9th Annual international conference on Research in Computational Molecular Biology
Year:
2005

Citing 2
Cited 6

Time Series Segmentation for Context Recognition in Mobile Devices

ICDM '01 Proceedings of the 2001 IEEE International Conference on Data Mining
Analysis of array CGH data: from signal ratio to gain and loss of DNA regions

Bioinformatics

The hunting of the bump: on maximizing statistical discrepancy

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Exploratory visualization of array-based comparative genomic hybridization

Information Visualization - Special issue: Bioinformatics visualization
A geometric framework for solving subsequence problems in computational biology efficiently

SCG '07 Proceedings of the twenty-third annual symposium on Computational geometry
Algorithms for computing the length-constrained max-score segments with applications to DNA copy number data analysis

ISAAC'07 Proceedings of the 18th international conference on Algorithms and computation
Approximation algorithms for speeding up dynamic programming and denoising aCGH data

Journal of Experimental Algorithmics (JEA)
An algorithm for a generalized maximum subsequence problem

LATIN'06 Proceedings of the 7th Latin American conference on Theoretical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Background. DNA amplifications and deletions characterize cancer genome and are often related to disease evolution. Microarray based techniques for measuring these DNA copy-number changes use fluorescence ratios at arrayed DNA elements (BACs, cDNA or oligonucleotides) to provide signals at high resolution, in terms of genomic locations. These data are then further analyzed to map aberrations and boundaries and identify biologically significant structures. Methods. We develop a statistical framework that enables the casting of several DNA copy number data analysis questions as optimization problems over real valued vectors of signals. The simplest form of the optimization problem seeks to maximize $\varphi (I) = \sum v_i/\sqrt{|I|}$ over all subintervals I in the input vector. We present and prove a linear time approximation scheme for this problem. Namely, a process with time complexity O(nε−2) that outputs an interval for which ϕ(I) is at least Opt/α(ε), where Opt is the actual optimum and α(ε) → 1 as ε → 0. We further develop practical implementations that improve the performance of the naive quadratic approach by orders of magnitude. We discuss properties of optimal intervals and how they apply to the algorithm performance. Examples. We benchmark our algorithms on synthetic as well as publicly available DNA copy number data. We demonstrate the use of these methods for identifying aberrations in single samples as well as common alterations in fixed sets and subsets of breast cancer samples.