An Address Dependence Model of Computation for Hierarchical Memories with Pipelined Transfer

  • Authors:
  • Gianfranco Bilardi;Kattamuri Ekanadham;Pratap Pattnaik

  • Affiliations:
  • T.J. Watson Research Center, IBM, Yorktown Heights, NY/ Università/ di Padova, Italy;T.J. Watson Research Center, IBM, Yorktown Heights, NY;T.J. Watson Research Center, IBM, Yorktown Heights, NY

  • Venue:
  • IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 8 - Volume 09
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Powerful memory models, including hierarchies with block transfer or with pipeline of accesses have been proposed in theory and are partially realized in commercial systems, to reduce the average memory overhead per operation. However, even for such powerful models, there are simple direct flow programs, with no branches and no indirect addressing, that require non-constant overhead, resulting in superlinear execution time. Indeed, we characterize a wide, natural class of machines, including nearly all previously proposed models, and develop a technique which yields superlinear time lower bounds on any machine of this class, for suitable direct-flow computations. We propose the Address Dependence Model (ADM) for machines with pipelined memory hierarchies, where any direct-flow program runs in time linear in the number of executed instructions. As an example of the capabilities of ADM for algorithms non amenable to direct-flow formulation, we show how to implement quicksort in time proportional to the number of executed comparisons, whose expected valu is O(n log n), even on memories where the latency of address x is a(x) = 驴(x). (In contrast, T = 驴(n log^2 n) for sorting in the block transfer model of [Hierarchical memory with block transfer].) Finally, we consider the question of physical implementation of ADM and propose an extensible machine design, in which the number of gates and the length of a wire that a signal traverses in one clock period are, within a given technology, independent of system size. Such designs scale with system size (in particular, with memory latency) as well as with technological advancement.We assume aggressive, but feasible [Optimal organizations for pipelined hierarchical memories], hierarchical memories pipelinable at a constant rate. The main contribution is a novel processor organization capable of fully exploiting such memories.