Languages with mismatches

  • Authors:
  • C. Epifanio;A. Gabriele;F. Mignosi;A. Restivo;M. Sciortino

  • Affiliations:
  • Dipartimento di Matematica e Applicazioni, Università di Palermo, Italy;Dipartimento di Matematica e Applicazioni, Università di Palermo, Italy;Dipartimento di Informatica, Università dellAquila, Italy;Dipartimento di Matematica e Applicazioni, Università di Palermo, Italy;Dipartimento di Matematica e Applicazioni, Università di Palermo, Italy

  • Venue:
  • Theoretical Computer Science
  • Year:
  • 2007

Quantified Score

Hi-index 5.23

Visualization

Abstract

In this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r. The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S and the maximal length of the minimal forbidden words of the language of factors of S with errors. Moreover, the repetition index plays an important role in the construction of an indexing data structure. More precisely, given a text S over a fixed alphabet, we build a data structure for approximate string matching having average size O(|S|log k+1|S|) and answering queries in time O(|x|+|occ(x)|) for any word x, where occ is the list of all occurrences of x in S up to errors.