KMRCRelat Algorithm for finding repeated words in sequences: Application on biological sequences

  • Authors:
  • N. El Kadhi

  • Affiliations:
  • Lab. d'Infor. de Paris Nord, INSTITUT GALILEE, and Atelier de Bio-Informatique, Université Paris VI, and L.E.R.I.A. Lab. EPITECH de recherché en Infor. Appliquée, France. E-mails: e ...

  • Venue:
  • Journal of Computational Methods in Sciences and Engineering - Selected papers from the International Conference on Computer Science,Software Engineering, Information Technology, e-Business, and Applications, 2003
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Searching repeated words in sequences is a problem treated in several ways. There are two categories of methods for searching word repetition: Methods for searching exact words and methods for searching approximate words. The exact words search methods allow searching words by not tolerating any errors or differences between words. The approximate words search methods consist in finding the words with K differences (or errors) from targets words M. When words represent continuations of letters or characters belonging to an alphabet S, these errors can be presented by substitution, suppression or insertion of characters. Approximate search methods are more usually used in bioinformatic because they offer greater flexibility allowing to find more words. There are several sequences analysis techniques. The most frequently used one consists of comparing the sequences by aligning them. In this paper, we first clearly delimit our work by studying different techniques. Then, we present a new fast and efficient algorithm derived from two former algorithms. KMRCRelat uses the concept of relational words. In fact, we present a word by its components and the relations between them. We only consider components and their successors relations.