A robust method for transcript quantification with RNA-seq data

Authors:
Yan Huang;Yin Hu;Corbin D. Jones;James N. MacLeod;Derek Y. Chiang;Yufeng Liu;Jan F. Prins;Jinze Liu
Affiliations:
Department of Computer Science, University of North Carolina, Chapel Hill;Department of Computer Science, University of North Carolina, Chapel Hill;Department of Biology, University of North Carolina, Chapel Hill;Department of Veterinary Science, University of Kentucky;Department of Genetics, University of North Carolina, Chapel Hill;Department of Statistics and Operations Research, University of North Carolina, Chapel Hill;Department of Computer Science, University of North Carolina, Chapel Hill;Department of Computer Science, University of North Carolina, Chapel Hill
Venue:
RECOMB'12 Proceedings of the 16th Annual international conference on Research in Computational Molecular Biology
Year:
2012

Citing 12
Cited 0

Matrix analysis

Matrix analysis
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
Algorithms for variable length Markov chain modeling

Bioinformatics
Exact Transcriptome Reconstruction from Short Sequence Reads

WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Statistical inferences for isoform expression in RNA-Seq

Bioinformatics
RNA-Seq gene expression estimation with read mapping uncertainty

Bioinformatics
A probabilistic framework for aligning paired-end RNA-seq data

Bioinformatics
Using non-uniform read distribution models to improve isoform expression inference in RNA-Seq

Bioinformatics
Isolasso: a lasso regression approach to RNA-seq based transcriptome assembly

RECOMB'11 Proceedings of the 15th Annual international conference on Research in computational molecular biology
FDM

Bioinformatics
SpliceTrap

Bioinformatics
Inference of isoforms from short sequence reads

RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology

Quantified Score

Hi-index	0.00

Visualization

Abstract

The advent of high throughput RNA-seq technology allows deep sampling of the transcriptome, making it possible to characterize both the diversity and the abundance of transcript isoforms. Accurate abundance estimation or transcript quantification of isoforms is critical for downstream differential analysis (e.g. healthy vs. diseased cells), but remains a challenging problem for several reasons. First, while various types of algorithms have been developed for abundance estimation, short reads often do not uniquely identify the transcript isoforms from which they were sampled. As a result, the quantification problem may not be identifiable, i.e. lacks a unique transcript solution even if the read maps uniquely to the reference genome. In this paper, we develop a general linear model for transcript quantification that leverages reads spanning multiple splice junctions to ameliorate identifiability. Second, RNA-seq reads sampled from the transcriptome exhibit unknown position-specific and sequence-specific biases. We extend our method to simultaneously learn bias parameters during transcript quantification to improve accuracy. Third, transcript quantification is often provided with a candidate set of isoforms, not all of which are likely to be significantly expressed in a given tissue type or condition. By resolving the linear system with LASSO our approach can infer an accurate set of dominantly expressed transcripts while existing methods tend to assign positive expression to every candidate isoform. Using simulated RNA-seq datasets, our method demonstrated better quantification accuracy than existing methods. The application of our method on real data experimentally demonstrated that transcript quantification is effective for differential analysis of transcriptomes.