Detecting Repeat Families in Incompletely Sequenced Genomes

  • Authors:
  • José Augusto Amgarten Quitzau;Jens Stoye

  • Affiliations:
  • AG Genominformatik, Technische Fakultät, and International NRW Graduate School in Bioinformatics and Genome Research, Bielefeld University, Germany;AG Genominformatik, Technische Fakultät,

  • Venue:
  • WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Repeats form a major class of sequence in genomes with implications for functional genomics and practical problems. Their detection and analysis pose a number of challenges in genomic sequence analysis, especially if the genome is not completely sequenced. The most abundant and evolutionary active forms of repeats are found in the form of familiesof long similar sequences. We present a novel method for repeat family detection and characterization in cases where the target genome sequence is not completely known. Therefore we first establish the sequence graph, a compacted version of sparse de Bruijn graphs. Using appropriate analysis of the structure of this graph and its connected components after local modifications, we are able to devise two algorithms for repeat family detection. The applicability of the methods is shown for both simulated and real genomic data sets.