Integrating copy number polymorphisms into array CGH analysis using a robust HMM

Authors:
Sohrab P. Shah;Xiang Xuan;Ron J. DeLeeuw;Mehrnoush Khojasteh;Wan L. Lam;Raymond Ng;Kevin P. Murphy
Affiliations:
-;-;-;-;-;-;-
Venue:
Bioinformatics
Year:
2006

Citing 0
Cited 7

Classification of Sporadic and BRCA1 Ovarian Cancer Based on a Genome-Wide Study of Copy Number Variations

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part II
GIMscan: a new statistical method for analyzing whole-genome array CGH data

RECOMB'07 Proceedings of the 11th annual international conference on Research in computational molecular biology
Adaptive image segmentation for region-based object retrieval using generalized Hough transform

Pattern Recognition
A Bayesian analysis for identifying DNA copy number variations using a compound Poisson process

EURASIP Journal on Bioinformatics and Systems Biology
Approximation algorithms for speeding up dynamic programming and denoising aCGH data

Journal of Experimental Algorithmics (JEA)
Detection of chromosomal abnormalities using high resolution arrays in clinical cancer research

Journal of Biomedical Informatics
A robust hidden semi-Markov model with application to aCGH data processing

International Journal of Data Mining and Bioinformatics

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Array comparative genomic hybridization (aCGH) is a pervasive technique used to identify chromosomal aberrations in human diseases, including cancer. Aberrations are defined as regions of increased or decreased DNA copy number, relative to a normal sample. Accurately identifying the locations of these aberrations has many important medical applications. Unfortunately, the observed copy number changes are often corrupted by various sources of noise, making the boundaries hard to detect. One popular current technique uses hidden Markov models (HMMs) to divide the signal into regions of constant copy number called segments; a subsequent classification phase labels each segment as a gain, a loss or neutral. Unfortunately, standard HMMs are sensitive to outliers, causing over-segmentation, where segments erroneously span very short regions. Results: We propose a simple modification that makes the HMM robust to such outliers. More importantly, this modification allows us to exploit prior knowledge about the likely location of “outliers”, which are often due to copy number polymorphisms (CNPs). By “explaining away” these outliers with prior knowledge about the locations of CNPs, we can focus attention on the more clinically relevant aberrated regions. We show significant improvements over the current state of the art technique (DNAcopy with MergeLevels) on previously published data from mantle cell lymphoma cell lines, and on published benchmark synthetic data augmented with outliers. Availability: Source code written in Matlab is available from. Contact: sshah@cs.ubc.ca