Optimal spliced alignments of short sequence reads
Bioinformatics
Bubbles: alternative splicing events of arbitrary dimension in splicing graphs
RECOMB'08 Proceedings of the 12th annual international conference on Research in computational molecular biology
Estimation of alternative splicing isoform frequencies from RNA-Seq data
WABI'10 Proceedings of the 10th international conference on Algorithms in bioinformatics
Inference of isoforms from short sequence reads
RECOMB'10 Proceedings of the 14th Annual international conference on Research in Computational Molecular Biology
IPEC'11 Proceedings of the 6th international conference on Parameterized and Exact Computation
A robust method for transcript quantification with RNA-seq data
RECOMB'12 Proceedings of the 16th Annual international conference on Research in Computational Molecular Biology
On the comparison of sets of alternative transcripts
ISBRA'12 Proceedings of the 8th international conference on Bioinformatics Research and Applications
Theoretical Computer Science
Hi-index | 0.00 |
In this paper we address the problem of characterizing the RNA complement of a given cell type, that is, the set of RNA species and their relative copy number, from a large set of short sequence reads which have been randomly sampled from the cell's RNA sequences through a sequencing experiment. We refer to this problem as the transcriptome reconstruction problem, and we specifically investigate, both theoretically and practically, the conditions under which the problem can be solved. We demonstrate that, even under the assumption of exact information, neither single read nor paired-end read sequences guarantee theoretically that the reconstruction problem has a unique solution. However, by investigating the behavior of the best annotated human gene set, we also show that, in practice, paired-end reads --- but not single reads --- may be sufficient to solve the vast majority of the transcript variants species and abundances. We finally show that, when we assume that the RNA species existing in the cell are known, single read sequences can effectively be used to infer transcript variant abundances.