An efficient algorithm to detect palindromes in DNA sequences using periodicity transform
Signal Processing - Special section: Advances in signal processing-assisted cross-layer designs
FireμSat: meeting the challenge of detecting microsatellites in DNA
SAICSIT '06 Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
Circle formation of weak robots and Lyndon words
Information Processing Letters
Detection of tandem repeats in DNA sequences based on parametric spectral estimation
IEEE Transactions on Information Technology in Biomedicine - Special section on computational intelligence in medical systems
ISB '10 Proceedings of the International Symposium on Biocomputing
Deterministic leader election in anonymous sensor networks without common coordinated system
OPODIS'07 Proceedings of the 11th international conference on Principles of distributed systems
String comparison and Lyndon-like factorization using V-order in linear time
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
An effective approach for mining frequent patterns in multiple biological sequences
Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Lyndon fountains and the Burrows-Wheeler transform
Proceedings of the CUBE International Information Technology Conference
A linear partitioning algorithm for Hybrid Lyndons using V-order
Theoretical Computer Science
Fine-tuning the search for microsatellites
Journal of Discrete Algorithms
Frequent patterns mining in multiple biological sequences
Computers in Biology and Medicine
Deterministic geoleader election in disoriented anonymous systems
Theoretical Computer Science
Identifying significant associations of orthologous simple sequence repeats with gene ontologies
International Journal of Data Mining and Bioinformatics
Pattern discovery for microsatellite genome analysis
Computers in Biology and Medicine
Hi-index | 3.84 |
Motivation: Tandem repeats consist in approximate and adjacent repetitions of a DNA motif. Such repeats account for large portions of eukaryotic genomes and have also been found in other life kingdoms. Owing to their polymorphism, tandem repeats have proven useful in genome cartography, forensic and population studies, etc. Nevertheless, they are not systematically detected nor annotated in genome projects. Partially because of this lack of data, their evolution is still poorly understood. Results: In this work, we design an exact algorithm to locate approximate tandem repeats (ATR) of a motif in a DNA sequence. Given a motif and a DNA sequence, our method named STAR, identifies all segments of the sequence that correspond to significant approximate tandem repetitions of the motif. In our model, an Exact Tandem Repeat (ETR) comes from the tandem duplication of the motif and an ATR derives from an ETR by a series of point mutations. An ATR can then be encoded as a number of duplications of the motif together with a list of mutations. Consequently, any sequence that is not an ATR cannot be encoded efficiently by this description, while a true ATR can. Our method uses the minimum description length criterion to identify which sequence segments are ATR. Our optimization procedure guarantees that STAR finds a combination of ATR that minimizes this criterion. Availability: for use at http://atgc.lirmm.fr/star Supplementary information: an appendix is available at http://atgc.lirmm.fr/star under 'Paper and contacts'.