Languages with mismatches and an application to approximate indexing

Authors:
Chiara Epifanio;Alessandra Gabriele;Filippo Mignosi
Affiliations:
Dipartimento di Matematica ed Applicazioni, Università degli Studi di Palermo, Palermo, Italy;Dipartimento di Matematica ed Applicazioni, Università degli Studi di Palermo, Palermo, Italy;Dipartimento di Matematica ed Applicazioni, Università degli Studi di Palermo, Palermo, Italy
Venue:
DLT'05 Proceedings of the 9th international conference on Developments in Language Theory
Year:
2005

Citing 6
Cited 4

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Average Case Analysis of Algorithms on Sequences

Average Case Analysis of Algorithms on Sequences
Efficient algorithms for document retrieval problems

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Dictionary matching and indexing with errors and don't cares

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Extracting approximate patterns

CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Indexing structures for approximate string matching

CIAC'03 Proceedings of the 5th Italian conference on Algorithms and complexity

Languages with mismatches

Theoretical Computer Science
From Nerode's congruence to suffix automata with mismatches

Theoretical Computer Science
On the suffix automaton with mismatches

CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Dictionary-symbolwise flexible parsing

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we describe a factorial language, denoted by L(S,k,r), that contains all words that occur in a string S up to k mismatches every r symbols. Then we give some combinatorial properties of a parameter, called repetition index and denoted by R(S,k,r), defined as the smallest integer h≥ 1 such that all strings of this length occur at most in a unique position of the text S up to k mismatches every r symbols. We prove that R(S,k,r) is a non-increasing function of r and a non-decreasing function of k and that the equation r=R(S,k,r) admits a unique solution. The repetition index plays an important role in the construction of an indexing data structure based on a trie that represents the set of all factors of L(S,k,r) having length equal to R(S,k,r). For each word x∈ L(S,k,r) this data structure allows us to find the list occ(x) of all occurrences of the word x in a text S up to k mismatches every r symbols in time proportional to |x|+|occ(x)|.