A new method for finding approximate repetitions in DNA sequences

Authors:
Di Wang;Guoren Wang;Qingquan Wu;Baichen Chen;Yi Zhao
Affiliations:
College of Information Science & Engineering, Northeastern University, Shenyang, China;College of Information Science & Engineering, Northeastern University, Shenyang, China;College of Information Science & Engineering, Northeastern University, Shenyang, China;College of Information Science & Engineering, Northeastern University, Shenyang, China;College of Information Science & Engineering, Northeastern University, Shenyang, China
Venue:
WAIM '06 Proceedings of the 7th international conference on Advances in Web-Age Information Management
Year:
2006

Citing 6
Cited 0

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
An algorithm for finding tandem repeats of unspecified pattern size

RECOMB '98 Proceedings of the second annual international conference on Computational molecular biology
Efficient Index Structures for String Databases

Proceedings of the 27th International Conference on Very Large Data Bases
An Algorithm for Approximate Tandem Repeats

CPM '93 Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching
Finding approximate tandem repeats in genomic sequences

RECOMB '04 Proceedings of the eighth annual international conference on Resaerch in computational molecular biology
Finding LPRs in DNA Sequence Based on a New Index — SUA

BIBE '05 Proceedings of the Fifth IEEE Symposium on Bioinformatics and Bioengineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Searching for approximate repetitions in a DNA sequence has been an important topic in gene analysis. One of the problems in the study is that because of the varying lengths of patterns, the similarity between patterns cannot be judged accurately if we use only the concept of ED ( Edit Distance ). In this paper we shall make effort to define a new function to compute similarity, which considers both the difference and sameness between patterns at the same time. Seeing the computational complexity, we shall also propose two new filter methods based on frequency distance and Pearson correlation, with which we can sort out candidate set of approximate repetitions efficiently. We use SUA instead of sliding window to get the fragments in a DNA sequence, so that the patterns of an approximate repetition have no limitation on length. The results show that with our technique we are able to find a bigger number of approximate repetitions than that of those found with tandem repeat finder.