Optimal organizations for pipelined hierarchical memories

Authors:
Gianfranco Bilardi;Kattamuri Ekanadham;Pratap Pattnaik
Affiliations:
Università di Padova, Padova, Italy;T.J. Watson Research Center, Yorktown Heights, NY;T.J. Watson Research Center, Yorktown Heights, NY
Venue:
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Year:
2002

Citing 17
Cited 7

A model of computation for VLSI with related complexity results

Journal of the ACM (JACM)
A model for hierarchical memory

STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
Cache and memory hierarchy design: a performance-directed approach

Cache and memory hierarchy design: a performance-directed approach
General purpose parallel architectures

Handbook of theoretical computer science (vol. A)
Horizons of parallel computation

Journal of Parallel and Distributed Computing
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Guest Editors' Introduction-Cache Memory and Related Problems: Enhancing and Exploiting the Locality

IEEE Transactions on Computers - Special issue on cache memory and related problems
Dynamic storage allocation in the Atlas computer, including an automatic use of a backing store

Communications of the ACM
Computational power of pipelined memory hierarchies

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Models of Computation: Exploring the Power of Computing

Models of Computation: Exploring the Power of Computing
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
The Ultrascalar Processor-An Asymptotically Scalable Superscalar Microarchitecture

ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Predicting Performance on SMPs. A Case Study: The SGI Power Challenge

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Pipelined Memory Hierarchies: Scalable Organizations and Application Performance

IWIA '01 Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'01)
An Approach towards an Analytical Characterization of Locality and its Portability

IWIA '01 Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'01)
Area-efficient vlsi computation

Area-efficient vlsi computation

An Address Dependence Model of Computation for Hierarchical Memories with Pipelined Transfer

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 8 - Volume 09
Translating submachine locality into locality of reference

Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
Models for parallel and hierarchical computation

Proceedings of the 4th international conference on Computing frontiers
On approximating the ideal random access machine by physical machines

Journal of the ACM (JACM)
psort, Yet Another Fast Stable Sorting Software

SEA '09 Proceedings of the 8th International Symposium on Experimental Algorithms
Evaluating multicore algorithms on the unified memory model

Scientific Programming - Software Development for Multi-core Computing Systems
psort, yet another fast stable sorting software

Journal of Experimental Algorithmics (JEA)

Quantified Score

Hi-index	0.00

Visualization

Abstract

In a recent paper (SPAA'01), we have established that the Pipelined Hierarchical Random Access Machine (PH-RAM) is a powerful model of computation, where most of the memory latency can be hidden by concurrency of accesses. In the present work, we explore the physical feasibility of PH-RAMs.A pipelined hierarchical memory of size $S$ is characterized by two metrics: the access function α(&khgr;), denoting the time required by an access to location $x$, and the pipeline period $p(S)$, denoting the minimum time between subsequent accesses that can be sustained. Physical constraints on minimum device size and maximum signal speed imply that, for a memory laid out in $d$ dimensions, a(&khgr;)= &OHgr;(&khgr;1/d)$. We propose a novel memory organization scheme that can be specialized to yield optimal performance α(&khgr;)=O(&khgr;^1/d)$ and $p(S)=O(1)$, for any $d \geq 1$.Managing a large number of concurrent load and store instructions would pose a significant burden on a traditional RISC processor, requiring both a large register file and complex logic to properly synchronize instructions. We show how these obstacles can be circumvented by introducing the Scalable transPORT (SPORT) computer where a simple processor drives a version of our pipelined hierarchical memory capable of servicing memory-to-memory instructions. We show that SPORT provides a feasible, scalable implementation of the PH-RAM model.