Revisiting bounded context block-sorting transformations

  • Authors:
  • J. Shane Culpepper;Matthias Petri;Simon J. Puglisi

  • Affiliations:
  • School of Computer Science & Information Technology, RMIT University, Melbourne, VIC3001, Australia;School of Computer Science & Information Technology, RMIT University, Melbourne, VIC3001, Australia;School of Computer Science & Information Technology, RMIT University, Melbourne, VIC3001, Australia

  • Venue:
  • Software—Practice & Experience
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Burrows–Wheeler Transform (BWT) produces a permutation of a string X, denoted X∗, by sorting the n cyclic rotations of X into full lexicographical order and taking the last column of the resulting n×n matrix to be X∗. The transformation is reversible in time. In this paper, we consider an alteration to the process, called k-BWT, where rotations are only sorted to a depth k. We propose new approaches to the forward and reverse transform, and show that the methods are efficient in practice. More than a decade ago, two algorithms were independently discovered for reversing k-BWT, both of which run in time. Two recent algorithms have lowered the bounds for the reverse transformation to and, respectively. We examine the practical performance for these reversal algorithms. We find that the original approach is most efficient in practice, and investigates new approaches, aimed at further speeding reversal, which store precomputed context boundaries in the compressed file. By explicitly encoding the context boundaries, we present an reversal technique that is both efficient and effective. Finally, our study elucidates an inherently cache-friendly – and hitherto unobserved – behavior in the reverse k-BWT, which could lead to new applications of the k-BWT transform. In contrast to previous empirical studies, we show that the partial transform can be reversed significantly faster than the full transform, without significantly affecting compression effectiveness. Copyright © 2011 John Wiley & Sons, Ltd.