A Modified Burrows-Wheeler Transformation for Case-Insensitive Search with Application to Suffix Array Compression

  • Authors:
  • Kunihiko Sadakane

  • Affiliations:
  • -

  • Venue:
  • DCC '99 Proceedings of the Conference on Data Compression
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

We propose a modified Burrows-Wheeler transformation. By using our transformation, we obtain a suffix array from a compressed text which can be used for case-insensitive searches. An exact query can be done from the result of a case-insensitive search because we can decode the original text from the compressed text. It is available for case-insensitive and more general character conversions. We call the conversion unification and the text after conversion unified text. The proposed transformation is defined by the suffix array of the unified text. Our transformation is not permutation of alphabet followed by the original transformation but a combination of unification and the original transformation. From a compressed text using our transformation we can obtain the original text and the suffix array of the unified text. After decoding we can perform ambiguous searches like case-insensitive search by using the suffix array. Experimental results show that our transformation decreases compression ratio very little. Though decompression and search takes more time than decoding of the original Block sorting plus grep command, finding positions of keywords is quite fast which is available for advanced searches.