An algorithm for mapping short reads to a dynamically changing genomic sequence

  • Authors:
  • Costas S. Iliopoulos;Derrick Kourie;Laurent Mouchard;Themba K. Musombuka;Solon P. Pissis;Corne de Ridder

  • Affiliations:
  • Kings College London, Dept. of Informatics, Strand, London WC2R 2LS, UK and Curtin University, Digital Ecosystems & Business Intelligence Institute, Center for Stringology & Applications, GPO Box ...;University of Pretoria, Dept. of Computer Science, Pretoria 0002, South Africa;Kings College London, Dept. of Informatics, Strand, London WC2R 2LS, UK and Curtin University, Digital Ecosystems & Business Intelligence Institute, Center for Stringology & Applications, GPO Box ...;University of Pretoria, Dept. of Computer Science, Pretoria 0002, South Africa and University of South Africa, School of Computing, P.O. Box 392 UNISA 0003, South Africa;Kings College London, Dept. of Informatics, Strand, London WC2R 2LS, UK;University of Pretoria, Dept. of Computer Science, Pretoria 0002, South Africa and University of South Africa, School of Computing, P.O. Box 392 UNISA 0003, South Africa

  • Venue:
  • Journal of Discrete Algorithms
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Next-generation sequencing technologies have redefined the way genome sequencing is performed. They are able to produce tens of millions of short sequences (reads), during a single experiment, and with a much lower cost than previously possible. Due to the dramatic increase in the amount of data generated, a challenging task is to map (align) a set of reads to a reference genome. In this paper, we study a different version of this problem: mapping these reads to a dynamically changing genomic sequence. We propose a new practical algorithm, which employs a suitable data structure that takes into account potential dynamic effects (replacements, insertions, deletions) on the genomic sequence. The presented experimental results demonstrate that the proposed approach can be extended and applied to address the problem of mapping short reads to multiple related genomes.