Note: k-difference matching in amortized linear time for all the words in a text

Authors:
Cinzia Pizzi
Affiliations:
Department of Information Engineering, University of Padova, Italy
Venue:
Theoretical Computer Science
Year:
2009

Citing 7
Cited 0

Efficient string matching with k mismatches

Theoretical Computer Science
Generalized string matching

SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Faster algorithms for string matching with k mismatches

Journal of Algorithms - Special issue: SODA 2000
Dictionary matching and indexing with errors and don't cares

STOC '04 Proceedings of the thirty-sixth annual ACM symposium on Theory of computing
Motif discovery by monotone scores

Discrete Applied Mathematics

Quantified Score

Hi-index	5.23

Visualization

Abstract

Given a text x of length n, we study the problem of solving the k-difference problem for all the words, either with fixed or variable length, taken from the text itself. The result finds its application in pattern discovery in biosequences where over- or under-represented words are extracted from the input sequences. The proposed algorithm runs in amortized linear time per word. This improves the complexity obtained by applying well-known algorithms to each of the O(n) fixed length words or O(n^2) variable length words in x by factor of k, klogk, or mlogm, depending on the chosen algorithm. The space required is O(n) if we just count the occurrences, or O(n^2) if we also store the positions. This second scenario can be used as the basis for other applications, such as searching gapped factors with mismatches or approximate pattern matching extended to any word.