Fast estimation of posterior probabilities in change-point analysis through a constrained hidden Markov model

Authors:
The Minh Luong;Yves Rozenholc;Gregory Nuel
Affiliations:
-;-;-
Venue:
Computational Statistics & Data Analysis
Year:
2013

Citing 12
Cited 1

Hidden Markov models approach to the analysis of array CGH data

Journal of Multivariate Analysis
Analysis of array CGH data: from signal ratio to gain and loss of DNA regions

Bioinformatics
Quantile smoothing of array CGH data

Bioinformatics
A comparison study: applying segmentation to array CGH data for downstream analyses

Bioinformatics
BioHMM: a heterogeneous hidden Markov model for segmenting array CGH data

Bioinformatics
Exploring the state sequence space for hidden Markov and semi-Markov chains

Computational Statistics & Data Analysis
A faster circular binary segmentation algorithm for the analysis of array CGH data

Bioinformatics
A fast Bayesian change point analysis for the segmentation of microarray data

Bioinformatics
Model-based clustering of array CGH data

Bioinformatics
Inference in Hidden Markov Models

Inference in Hidden Markov Models
Implied distributions in multiple change point problems

Statistics and Computing
Error bounds for convolutional codes and an asymptotically optimum decoding algorithm

IEEE Transactions on Information Theory

Fast estimation of the Integrated Completed Likelihood criterion for change-point detection problems with applications to Next-Generation Sequencing data

Signal Processing

Quantified Score

Hi-index	0.03

Visualization

Abstract

The detection of change-points in heterogeneous sequences is a statistical challenge with applications across a wide variety of fields. In bioinformatics, a vast amount of methodology exists to identify an ideal set of change-points for detecting Copy Number Variation (CNV). While considerable efficient algorithms are currently available for finding the best segmentation of the data in CNV, relatively few approaches consider the important problem of assessing the uncertainty of the change-point location. Asymptotic and stochastic approaches exist but often require additional model assumptions to speed up the computations, while exact methods generally have quadratic complexity which may be intractable for large data sets of tens of thousands points or more. A hidden Markov model, with constraints specifically chosen to correspond to a segment-based change-point model, provides an exact method for obtaining the posterior distribution of change-points with linear complexity. The methods are implemented in the R package postCP, which uses the results of a given change-point detection algorithm to estimate the probability that each observation is a change-point. The results include an implementation of postCP on a publicly available CNV data set (n=120). Due to its frequentist framework, postCP obtains less conservative confidence intervals than previously published Bayesian methods, but with linear complexity instead of quadratic. Simulations showed that postCP provided comparable loss to a Bayesian MCMC method when estimating posterior means, specifically when assessing larger scale changes, while being more computationally efficient. On another high-resolution CNV data set (n=14,241), the implementation processed information in less than one second on a mid-range laptop computer.