Languages with mismatches and an application to approximate indexing

  • Authors:
  • Chiara Epifanio;Alessandra Gabriele;Filippo Mignosi

  • Affiliations:
  • Dipartimento di Matematica ed Applicazioni, Università degli Studi di Palermo, Palermo, Italy;Dipartimento di Matematica ed Applicazioni, Università degli Studi di Palermo, Palermo, Italy;Dipartimento di Matematica ed Applicazioni, Università degli Studi di Palermo, Palermo, Italy

  • Venue:
  • DLT'05 Proceedings of the 9th international conference on Developments in Language Theory
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper we describe a factorial language, denoted by L(S,k,r), that contains all words that occur in a string S up to k mismatches every r symbols. Then we give some combinatorial properties of a parameter, called repetition index and denoted by R(S,k,r), defined as the smallest integer h≥ 1 such that all strings of this length occur at most in a unique position of the text S up to k mismatches every r symbols. We prove that R(S,k,r) is a non-increasing function of r and a non-decreasing function of k and that the equation r=R(S,k,r) admits a unique solution. The repetition index plays an important role in the construction of an indexing data structure based on a trie that represents the set of all factors of L(S,k,r) having length equal to R(S,k,r). For each word x∈ L(S,k,r) this data structure allows us to find the list occ(x) of all occurrences of the word x in a text S up to k mismatches every r symbols in time proportional to |x|+|occ(x)|.