On approximating the ideal random access machine by physical machines

Authors:
Gianfranco Bilardi;Kattamuri Ekanadham;Pratap Pattnaik
Affiliations:
Università di Padova, Padova, Italy;IBM T.J.Watson Research Center, Yorktown Heights, NY;IBM T.J.Watson Research Center, Yorktown Heights, NY
Venue:
Journal of the ACM (JACM)
Year:
2009

Citing 25
Cited 1

A model of computation for VLSI with related complexity results

Journal of the ACM (JACM)
A model for hierarchical memory

STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
Cache and memory hierarchy design: a performance-directed approach

Cache and memory hierarchy design: a performance-directed approach
Horizons of parallel computation

Journal of Parallel and Distributed Computing
Hitting the memory wall: implications of the obvious

ACM SIGARCH Computer Architecture News
The Tera computer system

ICS '90 Proceedings of the 4th international conference on Supercomputing
Guest Editors' Introduction-Cache Memory and Related Problems: Enhancing and Exploiting the Locality

IEEE Transactions on Computers - Special issue on cache memory and related problems
Can programming be liberated from the von Neumann style?: a functional style and its algebra of programs

Communications of the ACM
Dynamic storage allocation in the Atlas computer, including an automatic use of a backing store

Communications of the ACM
The memory gap and the future of high performance memories

ACM SIGARCH Computer Architecture News
Computational power of pipelined memory hierarchies

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Models of Computation: Exploring the Power of Computing

Models of Computation: Exploring the Power of Computing
Optimal organizations for pipelined hierarchical memories

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
Structure of Computers and Computations

Structure of Computers and Computations
Cache-Oblivious Algorithms

FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
I/O complexity: The red-blue pebble game

STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Predicting Performance on SMPs. A Case Study: The SGI Power Challenge

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Area-efficient vlsi computation

Area-efficient vlsi computation
An Address Dependence Model of Computation for Hierarchical Memories with Pipelined Transfer

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 8 - Volume 09
An experimental comparison of cache-oblivious and cache-conscious programs

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
The Speculative Prefetcher and Evaluator Processor for Pipelined Memory Hierarchies

IWIA '06 Proceedings of the International Workshop on Innovative Architecture for Future Generation High Performance Processors and Systems
Hierarchical memory with block transfer

SFCS '87 Proceedings of the 28th Annual Symposium on Foundations of Computer Science

Efficient stack distance computation for priority replacement policies

Proceedings of the 8th ACM International Conference on Computing Frontiers

Quantified Score

Hi-index	0.00

Visualization

Abstract

The capability of the Random Access Machine (RAM) to execute any instruction in constant time is not realizable, due to fundamental physical constraints on the minimum size of devices and on the maximum speed of signals. This work explores how well the ideal RAM performance can be approximated, for significant classes of computations, by machines whose building blocks have constant size and are connected at a constant distance. A novel memory structure is proposed, which is pipelined (can accept a new request at each cycle) and hierarchical, exhibiting optimal latency a(x) = O(x1/d) to address x, in d-dimensional realizations. In spite of block-transfer or other memory-pipeline capabilities, a number of previous machine models do not achieve a full overlap of memory accesses. These are examples of machines with explicit data movement. It is shown that there are direct-flow computations (without branches and indirect accesses) that require time superlinear in the number of instructions, on all such machines. To circumvent the explicit-data-movement constraints, the Speculative Prefetcher (SP) and the Speculative Prefetcher and Evaluator (SPE) processors are developed. Both processors can execute any direct-flow program in linear time. The SPE also executes in linear time a class of loop programs that includes many significant algorithms. Even quicksort, a somewhat irregular, recursive algorithm admits a linear-time SPE implementation. A relation between instructions called address dependence is introduced, which limits memory-access overlap and can lead to superlinear time, as illustrated with the classical merging algorithm.