A model of computation for VLSI with related complexity results
Journal of the ACM (JACM)
A model for hierarchical memory
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
Cache and memory hierarchy design: a performance-directed approach
Cache and memory hierarchy design: a performance-directed approach
Horizons of parallel computation
Journal of Parallel and Distributed Computing
Hitting the memory wall: implications of the obvious
ACM SIGARCH Computer Architecture News
ICS '90 Proceedings of the 4th international conference on Supercomputing
Guest Editors' Introduction-Cache Memory and Related Problems: Enhancing and Exploiting the Locality
IEEE Transactions on Computers - Special issue on cache memory and related problems
Communications of the ACM
Dynamic storage allocation in the Atlas computer, including an automatic use of a backing store
Communications of the ACM
The memory gap and the future of high performance memories
ACM SIGARCH Computer Architecture News
Computational power of pipelined memory hierarchies
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Models of Computation: Exploring the Power of Computing
Models of Computation: Exploring the Power of Computing
Optimal organizations for pipelined hierarchical memories
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Structure of Computers and Computations
Structure of Computers and Computations
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
I/O complexity: The red-blue pebble game
STOC '81 Proceedings of the thirteenth annual ACM symposium on Theory of computing
Predicting Performance on SMPs. A Case Study: The SGI Power Challenge
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Area-efficient vlsi computation
Area-efficient vlsi computation
An Address Dependence Model of Computation for Hierarchical Memories with Pipelined Transfer
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 8 - Volume 09
An experimental comparison of cache-oblivious and cache-conscious programs
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
The Speculative Prefetcher and Evaluator Processor for Pipelined Memory Hierarchies
IWIA '06 Proceedings of the International Workshop on Innovative Architecture for Future Generation High Performance Processors and Systems
Hierarchical memory with block transfer
SFCS '87 Proceedings of the 28th Annual Symposium on Foundations of Computer Science
Efficient stack distance computation for priority replacement policies
Proceedings of the 8th ACM International Conference on Computing Frontiers
Hi-index | 0.00 |
The capability of the Random Access Machine (RAM) to execute any instruction in constant time is not realizable, due to fundamental physical constraints on the minimum size of devices and on the maximum speed of signals. This work explores how well the ideal RAM performance can be approximated, for significant classes of computations, by machines whose building blocks have constant size and are connected at a constant distance. A novel memory structure is proposed, which is pipelined (can accept a new request at each cycle) and hierarchical, exhibiting optimal latency a(x) = O(x1/d) to address x, in d-dimensional realizations. In spite of block-transfer or other memory-pipeline capabilities, a number of previous machine models do not achieve a full overlap of memory accesses. These are examples of machines with explicit data movement. It is shown that there are direct-flow computations (without branches and indirect accesses) that require time superlinear in the number of instructions, on all such machines. To circumvent the explicit-data-movement constraints, the Speculative Prefetcher (SP) and the Speculative Prefetcher and Evaluator (SPE) processors are developed. Both processors can execute any direct-flow program in linear time. The SPE also executes in linear time a class of loop programs that includes many significant algorithms. Even quicksort, a somewhat irregular, recursive algorithm admits a linear-time SPE implementation. A relation between instructions called address dependence is introduced, which limits memory-access overlap and can lead to superlinear time, as illustrated with the classical merging algorithm.