Phrase-Based pattern matching in compressed text

  • Authors:
  • J. Shane Culpepper;Alistair Moffat

  • Affiliations:
  • NICTA Victoria Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Victoria, Australia;NICTA Victoria Laboratory, Department of Computer Science and Software Engineering, The University of Melbourne, Victoria, Australia

  • Venue:
  • SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Byte codes are a practical alternative to the traditional bit-oriented compression approaches when large alphabets are being used, and trade away a small amount of compression effectiveness for a relatively large gain in decoding efficiency. Byte codes also have the advantage of being searchable using standard string matching techniques. Here we describe methods for searching in byte-coded compressed text and investigate the impact of large alphabets on traditional string matching techniques. We also describe techniques for phrase-based searching in a restricted type of byte code, and present experimental results that compare our adapted methods with previous approaches.