Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Average Case Analysis of Algorithms on Sequences
Average Case Analysis of Algorithms on Sequences
Efficient algorithms for document retrieval problems
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Dictionary matching and indexing with errors and don't cares
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Extracting approximate patterns
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Indexing structures for approximate string matching
CIAC'03 Proceedings of the 5th Italian conference on Algorithms and complexity
Theoretical Computer Science
From Nerode's congruence to suffix automata with mismatches
Theoretical Computer Science
On the suffix automaton with mismatches
CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Dictionary-symbolwise flexible parsing
Journal of Discrete Algorithms
Hi-index | 0.00 |
In this paper we describe a factorial language, denoted by L(S,k,r), that contains all words that occur in a string S up to k mismatches every r symbols. Then we give some combinatorial properties of a parameter, called repetition index and denoted by R(S,k,r), defined as the smallest integer h≥ 1 such that all strings of this length occur at most in a unique position of the text S up to k mismatches every r symbols. We prove that R(S,k,r) is a non-increasing function of r and a non-decreasing function of k and that the equation r=R(S,k,r) admits a unique solution. The repetition index plays an important role in the construction of an indexing data structure based on a trie that represents the set of all factors of L(S,k,r) having length equal to R(S,k,r). For each word x∈ L(S,k,r) this data structure allows us to find the list occ(x) of all occurrences of the word x in a text S up to k mismatches every r symbols in time proportional to |x|+|occ(x)|.