Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling

Authors:
Brinda Ganesh;Aamer Jaleel;David Wang;Bruce Jacob
Affiliations:
University of Maryland, College Park, MD. brinda@eng.umd.edu, blj@eng.umd.edu;VSSAD, Intel Corporation, Hudson, MA. blj@eng.umd.edu;University of Maryland, College Park, MD. blj@eng.umd.edu;University of Maryland, College Park, MD. blj@eng.umd.edu
Venue:
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Year:
2007

Citing 0
Cited 15

Thermal modeling and management of DRAM memory systems

Proceedings of the 34th annual international symposium on Computer architecture
Optimizing thread throughput for multithreaded workloads on memory constrained CMPs

Proceedings of the 5th conference on Computing frontiers
Software thermal management of dram memory for multicore systems

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Decoupled DIMM: building high-bandwidth memory system using low-speed DRAM devices

Proceedings of the 36th annual international symposium on Computer architecture
Achieving predictable performance through better memory controller placement in many-core CMPs

Proceedings of the 36th annual international symposium on Computer architecture
Allocation wall: a limiting factor of Java applications on emerging multi-core platforms

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Scaling power/ground solvers on multi-core with memory bandwidth awareness

Proceedings of the 20th symposium on Great lakes symposium on VLSI
Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Coordinating processor and main memory for efficientserver power control

Proceedings of the international conference on Supercomputing
Towards energy-proportional datacenter memory with mobile DRAM

Proceedings of the 39th Annual International Symposium on Computer Architecture
A case for exploiting subarray-level parallelism (SALP) in DRAM

Proceedings of the 39th Annual International Symposium on Computer Architecture
Buffer-on-board memory systems

Proceedings of the 39th Annual International Symposium on Computer Architecture
Server-class DDR3 SDRAM memory buffer chip

IBM Journal of Research and Development
Reducing memory access latency with asymmetric DRAM bank organizations

Proceedings of the 40th Annual International Symposium on Computer Architecture
Memory-centric system interconnect design with hybrid memory cubes

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Performance gains in memory have traditionally been obtained by increasing memory bus widths and speeds. The diminishing returns of such techniques have led to the proposal of an alternate architecture, the Fully-Buffered DIMM. This new standard replaces the conventional memory bus with a narrow, high-speed interface between the memory controller and the DIMMs. This paper examines how traditional DDRx based memory controller policies for scheduling and row buffer management perform on a Fully-buffered DIMM memory architecture. The split-bus architecture used by FBDIMM systems results in an average improvement of 7% in latency and 10% in bandwidth at higher utilizations. On the other hand, at lower utilizations, the increased cost of serialization resulted in a degradation in latency and bandwidth of 25% and 10% respectively. The split-bus architecture also makes the system performance sensitive to the ratio of read and write traffic in the workload. In larger configurations, we found that the FBDIMM system performance was more sensitive to usage of the FBDIMM links than to DRAM bank availability. In general, FBDIMM performance is similar to that of DDRx systems, and provides better performance characteristics at higher utilization, making it a relatively inexpensive mechanism for scaling capacity at higher bandwidth requirements. The mechanism is also largely insensitive to scheduling policies, provided certain ground rules are obeyed.