Hardware-only stream prefetching and dynamic access ordering
Proceedings of the 14th international conference on Supercomputing
Proceedings of the 27th annual international symposium on Computer architecture
Algorithmic foundations for a parallel vector access memory system
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Dynamic Access Ordering for Streamed Computations
IEEE Transactions on Computers
Designing a Modern Memory Hierarchy with Hardware Prefetching
IEEE Transactions on Computers
Tiling, Block Data Layout, and Memory Hierarchy Performance
IEEE Transactions on Parallel and Distributed Systems
Design and Implementation of High-Performance Memory Systems for Future Packet Buffers
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Memory Controller Optimizations for Web Servers
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A DRAM/SRAM Memory Scheme for Fast Packet Buffers
IEEE Transactions on Computers
Efficient address remapping in distributed shared-memory systems
ACM Transactions on Architecture and Code Optimization (TACO)
SBCCI '06 Proceedings of the 19th annual symposium on Integrated circuits and systems design
The design space of data-parallel memory systems
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Virtually Pipelined Network Memory
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Impulse: Memory system support for scientific applications
Scientific Programming
Designing packet buffers for router linecards
IEEE/ACM Transactions on Networking (TON)
High-bandwidth Address Generation Unit
Journal of Signal Processing Systems
High-Performance Buffer Mapping to Exploit DRAM Concurrency in Multiprocessor DSP Systems
RSP '09 Proceedings of the 2009 IEEE/IFIP International Symposium on Rapid System Prototyping
High-bandwidth network memory system through virtual pipelines
IEEE/ACM Transactions on Networking (TON)
SAMS multi-layout memory: providing multiple views of data to boost SIMD performance
Proceedings of the 24th ACM International Conference on Supercomputing
High-performance and low-energy buffer mapping method for multiprocessor DSP systems
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.01 |
The focus of this paper is on designing both a low cost and high performance, high bandwidth vector memory system that takes advantage of modern commodity SDRAM memory chips. To successfully extract the full bandwidth from SDRAM parts, we propose a new memory system organization based on sending commands to the memory system as opposed to sending individual addresses. A command specifies, in a few bytes, a request for multiple independent memory words. A command is similar to a burst found in DRAM memories, but does not require the memory words to be consecutive. The command is sent to all sections of the memory array simultaneously, thus not requiring a crossbar in the proper sense. Our simulations show that this command based memory system can improve performance over a traditional SDRAM-based memory system by factors that range between 1.15 up to 1.54. Moreover, in many cases, the command memory system outperforms even the best SRAM memory system under consideration. Overall the command based memory system achieves similar or better results than a 10ns SRAM memory system (a) using fewer banks and (b) using memory devices that are between 15 to 60 times cheaper.