Memory bandwidth limitations of future microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
A performance comparison of contemporary DRAM architectures
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Access ordering and memory-conscious cache utilization
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
DRAM Energy Management Using Sof ware and Hardware Directed Power Mode Control
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Reducing DRAM Latencies with an Integrated Memory Hierarchy Design
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Adaptive History-Based Memory Schedulers
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Performance Comparison of DRAM Memory System Optimizations for SMT Processors
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Modern dram memory systems: performance analysis and scheduling algorithm
Modern dram memory systems: performance analysis and scheduling algorithm
The M5 Simulator: Modeling Networked Systems
IEEE Micro
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Limiting the power consumption of main memory
Proceedings of the 34th annual international symposium on Computer architecture
A Burst Scheduling Access Reordering Mechanism
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Mini-rank: Adaptive DRAM architecture for improving memory power efficiency
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Micro-pages: increasing DRAM efficiency with locality-aware data placement
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
MemScale: active low-power modes for main memory
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Adaptive granularity memory systems: a tradeoff between storage efficiency and throughput
Proceedings of the 38th annual international symposium on Computer architecture
BOOM: enabling mobile memory based low-power server DIMMs
Proceedings of the 39th Annual International Symposium on Computer Architecture
Buffer-on-board memory systems
Proceedings of the 39th Annual International Symposium on Computer Architecture
A survey of architectural techniques for DRAM power management
International Journal of High Performance Systems Architecture
CoScale: Coordinating CPU and Memory System DVFS in Server Systems
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
CPU transparent protection of OS kernel and hypervisor integrity with programmable DRAM
Proceedings of the 40th Annual International Symposium on Computer Architecture
Hi-index | 0.00 |
The widespread use of multicore processors has dramatically increased the demands on high bandwidth and large capacity from memory systems. In a conventional DDR2/DDR3 DRAM memory system, the memory bus and DRAM devices run at the same data rate. To improve memory bandwidth, we propose a new memory system design called decoupled DIMM that allows the memory bus to operate at a data rate much higher than that of the DRAM devices. In the design, a synchronization buffer is added to relay data between the slow DRAM devices and the fast memory bus; and memory access scheduling is revised to avoid access conflicts on memory ranks. The design not only improves memory bandwidth beyond what can be supported by current memory devices, but also improves reliability, power efficiency, and cost effectiveness by using relatively slow memory devices. The idea of decoupling, precisely the decoupling of bandwidth match between memory bus and a single rank of devices, can also be applied to other types of memory systems including FB-DIMM. Our experimental results show that a decoupled DIMM system of 2667MT/s bus data rate and 1333MT/s device data rate improves the performance of memory-intensive workloads by 51% on average over a conventional memory system of 1333MT/s data rate. Alternatively, a decoupled DIMM system of 1600MT/s bus data rate and 800MT/s device data rate incurs only 8% performance loss when compared with a conventional system of 1600MT/s data rate, with 16% reduction on the memory power consumption and 9% saving on memory energy.