Languages with mismatches

Authors:
C. Epifanio;A. Gabriele;F. Mignosi;A. Restivo;M. Sciortino
Affiliations:
Dipartimento di Matematica e Applicazioni, Università di Palermo, Italy;Dipartimento di Matematica e Applicazioni, Università di Palermo, Italy;Dipartimento di Informatica, Università dellAquila, Italy;Dipartimento di Matematica e Applicazioni, Università di Palermo, Italy;Dipartimento di Matematica e Applicazioni, Università di Palermo, Italy
Venue:
Theoretical Computer Science
Year:
2007

Citing 14
Cited 4

Text algorithms

Text algorithms
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
Text indexing and dictionary matching with one error

Journal of Algorithms
Words and special factors

Theoretical Computer Science
Average Case Analysis of Algorithms on Sequences

Average Case Analysis of Algorithms on Sequences
Efficient algorithms for document retrieval problems

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Handbook of Formal Languages

Handbook of Formal Languages
Words and forbidden factors

Theoretical Computer Science
A Metric Index for Approximate String Matching

LATIN '02 Proceedings of the 5th Latin American Symposium on Theoretical Informatics
Dictionary matching and indexing with errors and don't cares

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Word assembly through minimal forbidden words

Theoretical Computer Science
Indexing structures for approximate string matching

CIAC'03 Proceedings of the 5th Italian conference on Algorithms and complexity
Languages with mismatches and an application to approximate indexing

DLT'05 Proceedings of the 9th international conference on Developments in Language Theory
Text indexing with errors

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching

From Nerode's congruence to suffix automata with mismatches

Theoretical Computer Science
On the suffix automaton with mismatches

CIAA'07 Proceedings of the 12th international conference on Implementation and application of automata
Fast index for approximate string matching

Journal of Discrete Algorithms
Dictionary-symbolwise flexible parsing

Journal of Discrete Algorithms

Quantified Score

Hi-index	5.23

Visualization

Abstract

In this paper we study some combinatorial properties of a class of languages that represent sets of words occurring in a text S up to some errors. More precisely, we consider sets of words that occur in a text S with k mismatches in any window of size r. The study of this class of languages mainly focuses both on a parameter, called repetition index, and on the set of the minimal forbidden words of the language of factors of S with errors. The repetition index of a string S is defined as the smallest integer such that all strings of this length occur at most in a unique position of the text S up to errors. We prove that there is a strong relation between the repetition index of S and the maximal length of the minimal forbidden words of the language of factors of S with errors. Moreover, the repetition index plays an important role in the construction of an indexing data structure. More precisely, given a text S over a fixed alphabet, we build a data structure for approximate string matching having average size O(|S|log k+1|S|) and answering queries in time O(|x|+|occ(x)|) for any word x, where occ is the list of all occurrences of x in S up to errors.