p-Suffix sorting as arithmetic coding

  • Authors:
  • Richard Beal;Donald Adjeroh

  • Affiliations:
  • West Virginia University, Lane Department of Computer Science and Electrical Engineering, Morgantown, WV 26506, United States;West Virginia University, Lane Department of Computer Science and Electrical Engineering, Morgantown, WV 26506, United States

  • Venue:
  • Journal of Discrete Algorithms
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

The challenge of direct parameterized suffix sorting (p-suffix sorting) for a parameterized string (p-string), say T of length-n, is the dynamic nature of the n parameterized suffixes (p-suffixes) of T. In this work, we propose transformative approaches to direct p-suffix sorting by generating and sorting lexicographically numeric fingerprints and arithmetic codes that correspond to individual p-suffixes. Our algorithm to p-suffix sort via fingerprints is the first theoretical linear time algorithm for p-suffix sorting for non-binary parameter alphabets, which assumes that, in practice, all codes are within the range of an integral data type. We eliminate the key problems of fingerprints by introducing an algorithm that exploits the ordering of arithmetic codes to sort p-suffixes in linear time on average. The arithmetic coding approach is further extended to handle p-strings in the worst case. This algorithm is the first direct p-suffix sorting approach in theory to execute in o(n^2) time in the worst case, which improves on the best known theoretical result on this problem that sorts p-suffixes based on p-suffix classifications in O(n^2) time. We show that, based on the algorithmic parameters and the input data, our algorithm does indeed execute in linear time in various cases, which is confirmed with experimental results.