q-gram based database searching using a suffix array (QUASAR)
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Enumerating all connected maximal common subgraphs in two graphs
Theoretical Computer Science
Algorithm 457: finding all cliques of an undirected graph
Communications of the ACM
Lossless filter for multiple repetitions with Hamming distance
Journal of Discrete Algorithms
An optimized filter for finding multiple repeats in DNA sequences
AICCSA '10 Proceedings of the ACS/IEEE International Conference on Computer Systems and Applications - AICCSA 2010
VARUN: Discovering Extensible Motifs under Saturation Constraints
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Identifying SNPs without a reference genome by comparing raw reads
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Note: Extracting string motif bases for quorum higher than two
Theoretical Computer Science
Efficient bubble enumeration in directed graphs
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Hi-index | 0.04 |
We present an algorithm for detecting long similar fragments occurring at least twice in a set of biological sequences. The problem becomes computationally challenging when the frequency of a repeat is allowed to increase and when a non-negligible number of insertions, deletions and substitutions are allowed. We introduce in this paper an algorithm, Rime (for Repeat Identification: long, Multiple, and with Edits) that performs this task, and manages instances whose size and combination of parameters cannot be handled by other currently existing methods. This is achieved by using a filter as a preprocessing step, and by then exploiting the information gathered by the filter in the following actual repeat inference step. To the best of our knowledge, Rime is the first algorithm that can accurately deal with very long repeats (up to a few thousands), occurring possibly several times, and with a rate of differences (substitutions and indels) allowed among copies of a same repeat of 10-15% or even more.