Speeding Up Pattern Matching by Text Sampling

  • Authors:
  • Francisco Claude;Gonzalo Navarro;Hannu Peltola;Leena Salmela;Jorma Tarhio

  • Affiliations:
  • Department of Computer Science, University of Chile,;Department of Computer Science, University of Chile,;Department of Computer Science and Engineering, Helsinki University of Technology,;Department of Computer Science and Engineering, Helsinki University of Technology,;Department of Computer Science and Engineering, Helsinki University of Technology,

  • Venue:
  • SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

We introduce a novel alphabet sampling technique for speeding up both online and indexed string matching. We choose a subset of the alphabet and select the corresponding subsequence of the text. Online or indexed searching is then carried out on that subsequence, and candidate matches are verified in the full text. We show that this speeds up online searching, especially for moderate to long patterns, by a factor of up to 5. For indexed searching we achieve indexes that are as fast as the classical suffix array, yet occupy space less than 0.5 times the text size (instead of 4) plus text. Our experiments show no competitive alternatives in a wide space/time range.