Worst case efficient single and multiple string matching in the RAM model

  • Authors:
  • Djamal Belazzougui

  • Affiliations:
  • LIAFA, Univ. Paris Diderot - Paris 7, Paris Cedex 13, France

  • Venue:
  • IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we explore worst-case solutions for the problems of pattern and multi-pattern matching on strings in the RAM model with word length w. In the first problem, we have a pattern p of length m over an alphabet of size σ, and given any text T of length n, where each character is encoded using log s bit, we wish to find all occurrences of p. For the multi-pattern matching problem we have a set S of d patterns of total length m and a query on a text T consists in finding all the occurrences in T of the patterns in S (in the following we refer by occ to the number of reported occurrences). As each character of the text is encoded using log σ bits and we can read w bits in constant time in the RAM model, the best query time for the two problems which can only possibly be achieved by reading Θ(w/ log σ) consecutive characters, is O(nlog σ/w + occ). In this paper, we present two results. The first result is that using O(m) words of space, single pattern matching queries can be answered in time O(n(log m/m + log σ/w) + occ), and multiple pattern matching queries answered in time O(n(log d+log y+log log m/y + log σ/w)+ occ), and multiple pattern matching queries answered in time O(nlog d+log y+log logm/y + log σ/w )+occ), where y is the length of the shortest pattern. Our second result is a variant of the first result which uses the four Russian technique to remove the dependence on the shortest pattern length at the expense of using an additional space t. It answers to multi-pattern matching queries in time O(nlog d+log logσ t+log log m/logσ t + occ) using O(m + t) words of space.