The NAS parallel benchmarks—summary and preliminary results
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
IEEE Spectrum - Supercomputing
Accounting for memory bank contention and delay in high-bandwidth multiprocessors
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Accounting for Memory Bank Contention and Delay in High-Bandwidth Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Algorithmic foundations for a parallel vector access memory system
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
High-Performance DRAMs in Workstation Environments
IEEE Transactions on Computers
Cache performance in vector supercomputers
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
VL-CDRAM: variable line sized cached DRAMs
Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Design and Optimization of Large Size and Low Overhead Off-Chip Caches
IEEE Transactions on Computers
Near-memory Caching for Improved Energy Consumption
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Near-Memory Caching for Improved Energy Consumption
IEEE Transactions on Computers
Sams: single-affiliation multiple-stride parallel memory scheme
Proceedings of the 2008 workshop on Memory access on future processors: a solved problem?
A case for exploiting subarray-level parallelism (SALP) in DRAM
Proceedings of the 39th Annual International Symposium on Computer Architecture
Hi-index | 0.01 |
DRAMs containing cache memory are studied in the context of vector supercomputers. In particular, we consider systems where processors have no internal data caches and memory reference streams are generated by vector instructions. For this application, we expect that cached DRAMs can provide high bandwidth at relatively low cost.We study both DRAMs with a single, long cache line and with smaller, multiple cache lines. Memory interleaving schemes that increase data locality are proposed and studied. The interleaving schemes are also shown to lead to non-uniform bank accesses, i.e. hot banks. This suggest there is an important optimization problem involving methods that increase locality to improve performance, but not so much that hot banks diminish performance. We show that for uniprocessor systems, both types of cached DRAMs work well with the proposed interleave methods. For multiprogrammed multiprocessors, the multiple cache line DRAMs work better.