Computational power of pipelined memory hierarchies

  • Authors:
  • Gianfranco Bilardi;Kattamuri Ekanadham;Pratap Pattnaik

  • Affiliations:
  • Dip. Elettronica e Informatica, Università di Padova, Padova, Italy;T.J. Watson Research Center, IBM, Yorktown Heights, NY;T.J. Watson Research Center, IBM, Yorktown Heights, NY

  • Venue:
  • Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

We define a model of computation, called the Pipelined Hierarchical Random Access Machine with access function a (x), denoted the a(x)-PH-RAM. In this model, a processor interacts with a memory which can accept requests at a constant rate and satisfy each of the requests to the location x within a(x) units of time.We investigate memory management strategies that lead to time efficient implementations of arbitrary computations on a PH-RAM. We begin by developing the so called pipeline d decomposition-treememory management strategy, which can be tuned to the memory access function. Specifically, for a linear or sublinear access function a(x), w e define the concept of latency-hiding depth da(x) and show ho w an y computation of N operations can be implemented on an a(x)-PH-RAM in time T(N) = &Ogr;(Nda(N)). In particular, T(N) = &Ogr;(N log N) if a(x) = &Ogr;(x), T(N) = &Ogr;(N log log N) if a(x) = &Ogr;(x&Bgr;) with 0 &Bgr; T(N) = O(N log* N) if a(x) = &Ogr;(log x).We develop lower bound techniques that allow to establish existential lower bounds on PH-RAMs. In particular, we exhibit computations for which T(N) = &OHgr;(Nlog N/ log log N) when a(x) = &OHgr;(x), T(N) = &OHgr;(Nlog logN) when a(x) = &OHgr;(x&Bgr;) with 0 &Bgr; T(N) = &OHgr;(N log* N) when a(x) = &OHgr;(log x).The stated lower bounds show that the pipelined decomposition-tree strategy is existentially optimal for the latter case but indicates the potential for a modest, &Ogr;(log log N) improvement for linear access functions. To realize this potential, a superpipelined decomposition-tree memory manager is proposed, which achieves T(N) = &Ogr;(N log N/log log N).The pipelined decomposition-tree strategy can also be tuned to the computation, in order to exploit its temporal locality as characterized by the width parameters [9]. When the latter are suitably bounded, then T(N) = &Ogr;(N) on any PH-RAM with linear or sublinear access function. Finally, we discuss how performance could benefit from parallelism in the data-dependence dag of the computation or from architectural enhancements, such as block-transfer primitives, and formulate various questions that deserve further investigation.