Fast Bit-Reversals on Uniprocessors and Shared-Memory Multiprocessors

  • Authors:
  • Zhao Zhang;Xiaodong Zhang

  • Affiliations:
  • -;-

  • Venue:
  • SIAM Journal on Scientific Computing
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we examine different methods using techniques of blocking, buffering, and padding for efficient implementations of bit-reversals. We evaluate the merits and limits of each technique and its application and architecture-dependent conditions for developing cache-optimal methods. Besides testing the methods on different uniprocessors, we conducted both simulation and measurements on two commercial symmetric multiprocessors (SMP) to provide architectural insights into the methods and their implementations. We present two contributions in this paper: (1) Our integrated blocking methods, which match cache associativity and translation-lookaside buffer (TLB) cache size and which fully use the available registers, are cache-optimal and fast. (2) We show that our padding methods outperform other software-oriented methods, and we believe they are the fastest in terms of minimizing both CPU and memory access cycles. Since the padding methods are almost independent of hardware, they could be widely used on many uniprocessor workstations and multiprocessors.