Exact Transcriptome Reconstruction from Short Sequence Reads

Authors:
Vincent Lacroix;Michael Sammeth;Roderic Guigo;Anne Bergeron
Affiliations:
Genome Bioinformatics Research Group - CRG, Barcelona, Spain;Genome Bioinformatics Research Group - CRG, Barcelona, Spain;Genome Bioinformatics Research Group - CRG, Barcelona, Spain;Comparative Genomics Laboratory, Université du Québec à, Montréal, Canada
Venue:
WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Year:
2008

Citing 2
Cited 6

Optimal spliced alignments of short sequence reads

Bioinformatics
Bubbles: alternative splicing events of arbitrary dimension in splicing graphs

RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology

Estimation of alternative splicing isoform frequencies from RNA-Seq data

WABI'10 Proceedings of the 10th international conference on Algorithms in bioinformatics
Inference of isoforms from short sequence reads

RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
Sparse solutions of sparse linear systems: fixed-parameter tractability and an application of complex group testing

IPEC'11 Proceedings of the 6th international conference on Parameterized and Exact Computation
A robust method for transcript quantification with RNA-seq data

RECOMB'12 Proceedings of the 16th Annual international conference on Research in Computational Molecular Biology
On the comparison of sets of alternative transcripts

ISBRA'12 Proceedings of the 8th international conference on Bioinformatics Research and Applications
Sparse solutions of sparse linear systems: Fixed-parameter tractability and an application of complex group testing

Theoretical Computer Science

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we address the problem of characterizing the RNA complement of a given cell type, that is, the set of RNA species and their relative copy number, from a large set of short sequence reads which have been randomly sampled from the cell's RNA sequences through a sequencing experiment. We refer to this problem as the transcriptome reconstruction problem, and we specifically investigate, both theoretically and practically, the conditions under which the problem can be solved. We demonstrate that, even under the assumption of exact information, neither single read nor paired-end read sequences guarantee theoretically that the reconstruction problem has a unique solution. However, by investigating the behavior of the best annotated human gene set, we also show that, in practice, paired-end reads --- but not single reads --- may be sufficient to solve the vast majority of the transcript variants species and abundances. We finally show that, when we assume that the RNA species existing in the cell are known, single read sequences can effectively be used to infer transcript variant abundances.