Disaggregated memory for expansion and sharing in blade servers

Authors:
Kevin Lim;Jichuan Chang;Trevor Mudge;Parthasarathy Ranganathan;Steven K. Reinhardt;Thomas F. Wenisch
Affiliations:
University of Michigan, Ann Arbor, MI, USA;Hewlett-Packard Labs, Palo Alto, CA, USA;University of Michigan, Ann Arbor, MI, USA;Hewlett-Packard Labs, Palo Alto, CA, USA;Advanced Micro Devices, Inc., Bellevue, USA;University of Michigan, Ann Arbor, MI, USA
Venue:
Proceedings of the 36th annual international symposium on Computer architecture
Year:
2009

Citing 25
Cited 21

Memory coherence in shared virtual memory systems

ACM Transactions on Computer Systems (TOCS)
NUMA policies and their relation to memory architecture

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
The Stanford Dash Multiprocessor

Computer
Fine-grain access control for distributed shared memory

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Implementing global memory management in a workstation cluster

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
TreadMarks: Shared Memory Computing on Networks of Workstations

Computer
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Power aware page allocation

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
The Network RamDisk: Using remote memory on heterogeneous NOWs

Cluster Computing
Web Search for a Planet: The Google Cluster Architecture

IEEE Micro
WildFire: A Scalable Path for SMPs

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Dodo: A User-Level System for Exploiting Idle Memory in Workstation Clusters

HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
Enterprise IT Trends and Implications for Architecture Research

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
A Cost-Effective Main Memory Organization for Future Servers

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Memory resource management in VMware ESX server

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
A Robust Main-Memory Compression Scheme

Proceedings of the 32nd annual international symposium on Computer Architecture
Anemone: adaptive network memory engine

Proceedings of the twentieth ACM symposium on Operating systems principles
PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Cooperative caching: using remote client memory to improve file system performance

OSDI '94 Proceedings of the 1st USENIX conference on Operating Systems Design and Implementation
Understanding and Designing New Server Architectures for Emerging Warehouse-Computing Environments

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Improving NAND Flash Based Disk Caches

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Bridging the gap between software and hardware techniques for I/O virtualization

ATC'08 USENIX 2008 Annual Technical Conference on Annual Technical Conference
COTSon: infrastructure for full system simulation

ACM SIGOPS Operating Systems Review

Micro-pages: increasing DRAM efficiency with locality-aware data placement

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
SieveStore: a highly-selective, ensemble-level disk cache for cost-performance

Proceedings of the 37th annual international symposium on Computer architecture
Rethinking DRAM design and organization for energy-constrained multi-cores

Proceedings of the 37th annual international symposium on Computer architecture
Adaptive memory system over ethernet

HotStorage'10 Proceedings of the 2nd USENIX conference on Hot topics in storage and file systems
MemScale: active low-power modes for main memory

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Combining memory and a controller with photonics through 3D-stacking to enable scalable and energy-efficient systems

Proceedings of the 38th annual international symposium on Computer architecture
Power-efficient networking for balanced system designs: early experiences with PCIe

HotPower '11 Proceedings of the 4th Workshop on Power-Aware Computing and Systems
The accelerator store: A shared memory framework for accelerator-based systems

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Totally green: evaluating and designing servers for lifecycle environmental impact

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Evaluating Dynamics and Bottlenecks of Memory Collaboration in Cluster Systems

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
MultiScale: memory system DVFS with multiple memory controllers

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Towards energy-proportional datacenter memory with mobile DRAM

Proceedings of the 39th Annual International Symposium on Computer Architecture
The dynamic granularity memory system

Proceedings of the 39th Annual International Symposium on Computer Architecture
Barely alive memory servers: Keeping data active in a low-power state

ACM Journal on Emerging Technologies in Computing Systems (JETC)
Optimizing datacenter power with memory system levers for guaranteed quality-of-service

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
A collaborative memory system for high-performance and cost-effective clustered architectures

Proceedings of the 1st Workshop on Architectures and Systems for Big Data
Rethinking DRAM Power Modes for Energy Proportionality

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
CoScale: Coordinating CPU and Memory System DVFS in Server Systems

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Memory-efficient groupby-aggregate using compressed buffer trees

Proceedings of the 4th annual Symposium on Cloud Computing
Coordinate page allocation and thread group for improving main memory power efficiency

Proceedings of the Workshop on Power-Aware Computing and Systems
Network support for resource disaggregation in next-generation datacenters

Proceedings of the Twelfth ACM Workshop on Hot Topics in Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

Analysis of technology and application trends reveals a growing imbalance in the peak compute-to-memory-capacity ratio for future servers. At the same time, the fraction contributed by memory systems to total datacenter costs and power consumption during typical usage is increasing. In response to these trends, this paper re-examines traditional compute-memory co-location on a single system and details the design of a new general-purpose architectural building block-a memory blade-that allows memory to be "disaggregated" across a system ensemble. This remote memory blade can be used for memory capacity expansion to improve performance and for sharing memory across servers to reduce provisioning and power costs. We use this memory blade building block to propose two new system architecture solutions-(1) page-swapped remote memory at the virtualization layer, and (2) block-access remote memory with support in the coherence hardware-that enable transparent memory expansion and sharing on commodity-based systems. Using simulations of a mix of enterprise benchmarks supplemented with traces from live datacenters, we demonstrate that memory disaggregation can provide substantial performance benefits (on average 10X) in memory constrained environments, while the sharing enabled by our solutions can improve performance-per-dollar by up to 57% when optimizing memory provisioning across multiple servers.