A model of computation for VLSI with related complexity results
Journal of the ACM (JACM)
A model for hierarchical memory
STOC '87 Proceedings of the nineteenth annual ACM symposium on Theory of computing
Cache and memory hierarchy design: a performance-directed approach
Cache and memory hierarchy design: a performance-directed approach
General purpose parallel architectures
Handbook of theoretical computer science (vol. A)
Horizons of parallel computation
Journal of Parallel and Distributed Computing
ICS '90 Proceedings of the 4th international conference on Supercomputing
Guest Editors' Introduction-Cache Memory and Related Problems: Enhancing and Exploiting the Locality
IEEE Transactions on Computers - Special issue on cache memory and related problems
Dynamic storage allocation in the Atlas computer, including an automatic use of a backing store
Communications of the ACM
Computational power of pipelined memory hierarchies
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
Models of Computation: Exploring the Power of Computing
Models of Computation: Exploring the Power of Computing
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
The Ultrascalar Processor-An Asymptotically Scalable Superscalar Microarchitecture
ARVLSI '99 Proceedings of the 20th Anniversary Conference on Advanced Research in VLSI
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Predicting Performance on SMPs. A Case Study: The SGI Power Challenge
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Pipelined Memory Hierarchies: Scalable Organizations and Application Performance
IWIA '01 Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'01)
An Approach towards an Analytical Characterization of Locality and its Portability
IWIA '01 Proceedings of the Innovative Architecture for Future Generation High-Performance Processors and Systems (IWIA'01)
Area-efficient vlsi computation
Area-efficient vlsi computation
An Address Dependence Model of Computation for Hierarchical Memories with Pipelined Transfer
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 8 - Volume 09
Translating submachine locality into locality of reference
Journal of Parallel and Distributed Computing - Special issue: 18th International parallel and distributed processing symposium
Models for parallel and hierarchical computation
Proceedings of the 4th international conference on Computing frontiers
On approximating the ideal random access machine by physical machines
Journal of the ACM (JACM)
psort, Yet Another Fast Stable Sorting Software
SEA '09 Proceedings of the 8th International Symposium on Experimental Algorithms
Evaluating multicore algorithms on the unified memory model
Scientific Programming - Software Development for Multi-core Computing Systems
psort, yet another fast stable sorting software
Journal of Experimental Algorithmics (JEA)
Hi-index | 0.00 |
In a recent paper (SPAA'01), we have established that the Pipelined Hierarchical Random Access Machine (PH-RAM) is a powerful model of computation, where most of the memory latency can be hidden by concurrency of accesses. In the present work, we explore the physical feasibility of PH-RAMs.A pipelined hierarchical memory of size $S$ is characterized by two metrics: the access function α(&khgr;), denoting the time required by an access to location $x$, and the pipeline period $p(S)$, denoting the minimum time between subsequent accesses that can be sustained. Physical constraints on minimum device size and maximum signal speed imply that, for a memory laid out in $d$ dimensions, a(&khgr;)= &OHgr;(&khgr;1/d)$. We propose a novel memory organization scheme that can be specialized to yield optimal performance α(&khgr;)=O(&khgr;^1/d)$ and $p(S)=O(1)$, for any $d \geq 1$.Managing a large number of concurrent load and store instructions would pose a significant burden on a traditional RISC processor, requiring both a large register file and complex logic to properly synchronize instructions. We show how these obstacles can be circumvented by introducing the Scalable transPORT (SPORT) computer where a simple processor drives a version of our pipelined hierarchical memory capable of servicing memory-to-memory instructions. We show that SPORT provides a feasible, scalable implementation of the PH-RAM model.