Exact Transcriptome Reconstruction from Short Sequence Reads

  • Authors:
  • Vincent Lacroix;Michael Sammeth;Roderic Guigo;Anne Bergeron

  • Affiliations:
  • Genome Bioinformatics Research Group - CRG, Barcelona, Spain;Genome Bioinformatics Research Group - CRG, Barcelona, Spain;Genome Bioinformatics Research Group - CRG, Barcelona, Spain;Comparative Genomics Laboratory, Université du Québec à, Montréal, Canada

  • Venue:
  • WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we address the problem of characterizing the RNA complement of a given cell type, that is, the set of RNA species and their relative copy number, from a large set of short sequence reads which have been randomly sampled from the cell's RNA sequences through a sequencing experiment. We refer to this problem as the transcriptome reconstruction problem, and we specifically investigate, both theoretically and practically, the conditions under which the problem can be solved. We demonstrate that, even under the assumption of exact information, neither single read nor paired-end read sequences guarantee theoretically that the reconstruction problem has a unique solution. However, by investigating the behavior of the best annotated human gene set, we also show that, in practice, paired-end reads --- but not single reads --- may be sufficient to solve the vast majority of the transcript variants species and abundances. We finally show that, when we assume that the RNA species existing in the cell are known, single read sequences can effectively be used to infer transcript variant abundances.