String matching with alphabet sampling

  • Authors:
  • Francisco Claude;Gonzalo Navarro;Hannu Peltola;Leena Salmela;Jorma Tarhio

  • Affiliations:
  • David R. Cheriton School of Computer Science, University of Waterloo, Canada;Department of Computer Science, University of Chile, Chile;Department of Computer Science and Engineering, Aalto University, Finland;Department of Computer Science, University of Helsinki, Finland;Department of Computer Science and Engineering, Aalto University, Finland

  • Venue:
  • Journal of Discrete Algorithms
  • Year:
  • 2012

Quantified Score

Hi-index 0.01

Visualization

Abstract

We introduce a novel alphabet sampling technique for speeding up both online and indexed string matching. We choose a subset of the alphabet and extract the corresponding subsequence of the text. Online or indexed searching is then carried out on the extracted subsequence, and candidate matches are verified in the full text. We show that this speeds up online searching, especially for moderate to long patterns, by a factor of up to 5, while using 14% extra space in our experiments. For indexed searching we achieve indexes that are as fast as the classical suffix array, yet occupy less than 50% extra space (instead of the usual 400%). Our experiments show no competitive alternatives exist in a wide space/time range.