KMRCRelat Algorithm for finding repeated words in sequences: Application on biological sequences

Authors:
N. El Kadhi
Affiliations:
Lab. d'Infor. de Paris Nord, INSTITUT GALILEE, and Atelier de Bio-Informatique, Université Paris VI, and L.E.R.I.A. Lab. EPITECH de recherché en Infor. Appliquée, France. E-mails: e ...
Venue:
Journal of Computational Methods in Sciences and Engineering - Selected papers from the International Conference on Computer Science,Software Engineering, Information Technology, e-Business, and Applications, 2003
Year:
2005

Citing 4
Cited 0

An Algorithm for Finding a Common Structure Shared by a Family of Strings

IEEE Transactions on Pattern Analysis and Machine Intelligence
Searching for flexible repeated patterns using a non-transitive similarity relation

Pattern Recognition Letters
Discovering Chronicles with Numerical Time Constraints from Alarm Logs for Monitoring Dynamic Systems

IJCAI '99 Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence
Rapid identification of repeated patterns in strings, trees and arrays

STOC '72 Proceedings of the fourth annual ACM symposium on Theory of computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Searching repeated words in sequences is a problem treated in several ways. There are two categories of methods for searching word repetition: Methods for searching exact words and methods for searching approximate words. The exact words search methods allow searching words by not tolerating any errors or differences between words. The approximate words search methods consist in finding the words with K differences (or errors) from targets words M. When words represent continuations of letters or characters belonging to an alphabet S, these errors can be presented by substitution, suppression or insertion of characters. Approximate search methods are more usually used in bioinformatic because they offer greater flexibility allowing to find more words. There are several sequences analysis techniques. The most frequently used one consists of comparing the sequences by aligning them. In this paper, we first clearly delimit our work by studying different techniques. Then, we present a new fast and efficient algorithm derived from two former algorithms. KMRCRelat uses the concept of relational words. In fact, we present a word by its components and the relations between them. We only consider components and their successors relations.