Trajectory representation using Gabor features for motion-based video retrieval
Pattern Recognition Letters
Abstractions in Process Mining: A Taxonomy of Patterns
BPM '09 Proceedings of the 7th International Conference on Business Process Management
Detection of tandem repeats in DNA sequences based on parametric spectral estimation
IEEE Transactions on Information Technology in Biomedicine - Special section on computational intelligence in medical systems
Efficient exact edit similarity query processing with the asymmetric signature scheme
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Detecting fuzzy amino acid tandem repeats in protein sequences
Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Unsupervised learning of patterns in data streams using compression and edit distance
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Two
Approximate period detection and correction
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Trie-based similarity search and join
Proceedings of the Joint EDBT/ICDT 2013 Workshops
Asymmetric signature schemes for efficient exact edit similarity query processing
ACM Transactions on Database Systems (TODS)
Classification of Tandem Repeats in the Human Genome
International Journal of Knowledge Discovery in Bioinformatics
Classification of Tandem Repeats in the Human Genome
International Journal of Knowledge Discovery in Bioinformatics
Pattern discovery for microsatellite genome analysis
Computers in Biology and Medicine
Hi-index | 3.84 |
Motivation: A tandem repeat in DNA is a sequence of two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats occur in the genomes of both eukaryotic and prokaryotic organisms. They are important in numerous fields including disease diagnosis, mapping studies, human identity testing (DNA fingerprinting), sequence homology and population studies. Although tandem repeats have been used by biologists for many years, there are few tools available for performing an exhaustive search for all tandem repeats in a given sequence. Results: In this paper we describe an efficient algorithm for finding all tandem repeats within a sequence, under the edit distance measure. The contributions of this paper are two-fold: theoretical and practical. We present a precise definition for tandem repeats over the edit distance and an efficient, deterministic algorithm for finding these repeats. Availability: The algorithm has been implemented in C++, and the software is available upon request and can be used at http://www.sci.brooklyn.cuny.edu/~sokol/trepeats. The use of this tool will assist biologists in discovering new ways that tandem repeats affect both the structure and function of DNA and protein molecules. Contact: sokol@sci.brooklyn.cuny.edu