Exhaustive whole-genome tandem repeats search

Authors:
Arun Krishnan;Francis Tang
Affiliations:
Bioinformatics Institute, 30 Biopolis Street, No. 07-01, Matrix, Singapore 138671, Singapore;Bioinformatics Institute, 30 Biopolis Street, No. 07-01, Matrix, Singapore 138671, Singapore
Venue:
Bioinformatics
Year:
2004

Citing 0
Cited 8

FireμSat: meeting the challenge of detecting microsatellites in DNA

SAICSIT '06 Proceedings of the 2006 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
New aspects in numerical representations involved in DNA repeats detection

WSEAS Transactions on Signal Processing
New representations in DNA repeats detection

ISCGAV'08 Proceedings of the 8th conference on Signal processing, computational geometry and artificial vision
Spectral representations of alpha satellite DNA

WSEAS Transactions on Information Science and Applications
Alpha satellite DNA analysis using spectrograms

MCBC'09 Proceedings of the 10th WSEAS international conference on Mathematics and computers in biology and chemistry
Detection of tandem repeats in DNA sequences based on parametric spectral estimation

IEEE Transactions on Information Technology in Biomedicine - Special section on computational intelligence in medical systems
Fine-tuning the search for microsatellites

Journal of Discrete Algorithms
Frequent patterns mining in multiple biological sequences

Computers in Biology and Medicine

Quantified Score

Hi-index	3.84

Visualization

Abstract

Motivation: Approximate tandem repeats (ATR) occur frequently in the genomes of organisms, and are a source of polymorphisms observed in individuals, and thus are of interest to those studying genetic disorders. Though extensive work has been done in order to identify ATRs, there are inherent limitations with the current approaches in terms of the number of pattern sizes that can be searched or the size of the input length. Results: This paper describes (1) a new algorithm which exhaustively finds all variable-length ATRs in a genomic sequence and (2) a precise description of, and an algorithm to significantly reduce, redundancy in the output. Our ATR definition is parameterized by a mismatch ratio p which allows for more mismatches in longer tandem repeats (and fewer in shorter). Furthermore, our algorithm is embarrassingly parallel and thus can attain near-linear speed-up on Beowulf clusters. We present results of our algorithm applied to sequences of widely differing lengths (from genes to chromosomes). Availability: Source and binaries are available on request. Supplementary information: http://web.bii.a-star.edu.sg/~francis/Research/Exhaustive/