Design and evaluation of dynamic access ordering hardware
ICS '96 Proceedings of the 10th international conference on Supercomputing
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Increasing TLB reach using superpages backed by shadow memory
Proceedings of the 25th annual international symposium on Computer architecture
Design of the 21174 memory controller for DIGITAL Personal Workstations
Digital Technical Journal
A performance comparison of contemporary DRAM architectures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
New DRAM Technologies: A Comprehensive Analysis of the New Architecture
New DRAM Technologies: A Comprehensive Analysis of the New Architecture
Microcomputer Interfacing
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Access ordering and memory-conscious cache utilization
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Access Order and Effective Bandwidth for Streams on a Direct Rambus Memory
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Impulse: Building a Smarter Memory Controller
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
High-Performance DRAMs in Workstation Environments
IEEE Transactions on Computers
Estimating influence of data layout optimizations on SDRAM energy consumption
Proceedings of the 2003 international symposium on Low power electronics and design
Memory Controller Optimizations for Web Servers
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A study of performance impact of memory controller features in multi-processor server environment
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
DRAMsim: a memory system simulator
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Mini-rank: Adaptive DRAM architecture for improving memory power efficiency
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Design of an interconnect architecture and signaling technology for parallelism in communication
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Decoupled DIMM: building high-bandwidth memory system using low-speed DRAM devices
Proceedings of the 36th annual international symposium on Computer architecture
Proceedings of the 36th annual international symposium on Computer architecture
Micro-pages: increasing DRAM efficiency with locality-aware data placement
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Rethinking DRAM design and organization for energy-constrained multi-cores
Proceedings of the 37th annual international symposium on Computer architecture
Handling the problems and opportunities posed by multiple on-chip memory controllers
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
DRAM selection and configuration for real-time mobile systems
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Given a fixed CPU architecture and a fixed DRAM timing specification, there is still a large design space for a DRAM system organization. Parameters include the number of memory channels, the bandwidth of each channel, burst sizes, queue sizes and organizations, turnaround overhead, memory-controller page protocol, algorithms for assigning request priorities and scheduling requests dynamically, etc. In this design space, we see a wide variation in application execution times: for example, execution times for SPEC CPU 2000 integer suite on a 2-way ganged Direct Rambus organization (32 data bits) with 64-byte bursts are 10-20% lower than execution times on an otherwise identical configuration that uses 32-byte bursts. This represents two system configurations that are relatively close to each other in the design space; performance differences become even more pronounced for designs further apart.This paper characterizes the sources of overhead in high-performance DRAM systems and investigates the most effective ways to reduce a system's exposure to performance loss. In particular, we look at mechanisms to increase a system's support for concurrent transactions, mechanisms to reduce request latency, and mechanisms to reduce the “system overhead”—the portion of the primary memory system's overhead that is not due to DRAM latency but rather to things like turnaround time, request queueing, inefficiencies due to read/write request interleaving, etc. Our simulator models a 2GHz, highly aggressive out-of-order uniprocessor. The interface to the memory system is fully non-blocking, supporting up to 32 outstanding misses at both the level-1 and level-2 caches and split-transaction busses to all DRAM banks.