Towards an Optimal Bit-Reversal Permutation Program

  • Authors:
  • Larry Carter;Kang Su Gatlin

  • Affiliations:
  • -;-

  • Venue:
  • FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

The speed of many computations is limited not by the number of arithmetic operations but by the time it takes to move and rearrange data in the increasingly complicated memory hierarchies of modern computers. Array transpose and the bit-reversal permutation -- trivial operations on a RAM -- present non-trivial problems when designing highly-tuned scientific library functions, particular for the Fast Fourier Transform. We prove a precise bound for RoCol, a simple pebble-type game that is relevant to implementing these permutations. We use RoCol to give lower bounds on the amount of memory traffic in a computer with four-levels of memory (registers, cache, TLB, and memory), taking into account such ``messy'' features as block moves and set-associative caches. The insights from this analysis lead to a bit-reversal algorithm whose performance is close to the theoretical minimum. Experiments show it performs significantly better than every program in a comprehensive study of 30 published algorithms.