Text algorithms
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Text indexing and dictionary matching with one error
Journal of Algorithms
Theoretical Computer Science
Average Case Analysis of Algorithms on Sequences
Average Case Analysis of Algorithms on Sequences
Efficient algorithms for document retrieval problems
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Handbook of Formal Languages
Theoretical Computer Science
A Metric Index for Approximate String Matching
LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
Dictionary matching and indexing with errors and don't cares
STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Word assembly through minimal forbidden words
Theoretical Computer Science
Indexing structures for approximate string matching
CIAC'03 Proceedings of the 5th Italian conference on Algorithms and complexity
Languages with mismatches and an application to approximate indexing
DLT'05 Proceedings of the 9th international conference on Developments in Language Theory
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
From Nerode's congruence to suffix automata with mismatches
Theoretical Computer Science
On the suffix automaton with mismatches
CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Fast index for approximate string matching
Journal of Discrete Algorithms
Dictionary-symbolwise flexible parsing
Journal of Discrete Algorithms
Hi-index | 5.23 |
In this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r. The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S and the maximal length of the minimal forbidden words of the language of factors of S with errors. Moreover, the repetition index plays an important role in the construction of an indexing data structure. More precisely, given a text S over a fixed alphabet, we build a data structure for approximate string matching having average size O(|S|log k+1|S|) and answering queries in time O(|x|+|occ(x)|) for any word x, where occ is the list of all occurrences of x in S up to errors.