The Uniform Memory Hierarchy Model of Computation

  • Authors:
  • Bowen Alpern;Larry Carter;Ephraim Feig;Ted Selker

  • Affiliations:
  • -;-;-;-

  • Venue:
  • The Uniform Memory Hierarchy Model of Computation
  • Year:
  • 1993

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Uniform Memory Hierarchy (UMH) model introduced in this paper captures performance-relevant aspects of the hierarchical nature of computer memory. It is used to quantify architectural requirements of several algorithms and to ratify the faster speeds achieved by tuned implementations that use improved data-movement strategies. A sequential computer''s memory is modelled as a sequence [M0, M1, ...] of increasingly large memory modules. Computation takes place in M0. Thus, M0 might model a computer''s central processor, while M1 might be cache memory, M2 main memory, and so on. For each module Mu, a bus Bu connects it with the next larger module Mu+1. All buses may be active simultaneously. Data is transferred along a bus in fixed-size blocks. The size of these blocks, the time required to transfer a block, and the number of blocks that fit in a module are larger for modules farther from the processor. The UMH model is parameterized by the rate at which the blocksizes increase and by the ratio of the blockcount to the blocksize. A third parameter, the transfer cost (inverse bandwidth) function, determines the time to transfer blocks at different levels of the hierarchy. UMH analysis refines traditional methods of algorithm analysis by including the cost of data movement throughout the memory hierarchy. The communications efficiency of a program is a ratio measuring the portion of UMH running time during which M0 is active. An algorithm that can be implemented by a program with positive asymptotic communication efficiency is said to be communication-efficient. The communication efficiency of a program depends on the parameters of the UMH model, most importantly on the transfer cost function. A threshold function separates those transfer cost functions for which an algorithm is communication-efficient from those that are too costly. Threshold functions for matrix transpose, standard matrix multiplication, and Fast Fourier Transform algorithms are established by exhibiting communication-efficient programs at the threshold and showing that more expensive transfer cost functions are too costly. A parallel computer can be modelled as a tree of memory modules with computation occurring at the leaves. Threshold functions are established for multiplication of N x N matrices using up to N2 processors in a tree with constant branching factor.