A collaborative memory system for high-performance and cost-effective clustered architectures

Authors:
Ahmad Samih;Ren Wang;Christian Maciocco;Tsung-Yuan Charlie Tai;Yan Solihin
Affiliations:
System Architecture Lab, Intel Research Labs, Hillsboro, OR and North Carolina State University, Raleigh, NC;System Architecture Lab, Intel Research Labs, Hillsboro, OR;System Architecture Lab, Intel Research Labs, Hillsboro, OR;System Architecture Lab, Intel Research Labs, Hillsboro, OR;North Carolina State University, Raleigh, NC
Venue:
Proceedings of the 1st Workshop on Architectures and Systems for Big Data
Year:
2011

Citing 20
Cited 1

Distributed operating systems

ACM Computing Surveys (CSUR) - The MIT Press scientific computation series
Adding Flexibility to a Remote Memory Pager

IWOOOS '95 Proceedings of the 4th International Workshop on Object-Orientation in Operating Systems
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors

Proceedings of the 32nd annual international symposium on Computer Architecture
Optimizing Replication, Communication, and Capacity Allocation in CMPs

Proceedings of the 32nd annual international symposium on Computer Architecture
Cooperative Caching for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Architectural support for operating system-driven CMP cache management

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
iWARP ethernet: key to driving ethernet into high performance environments

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
ASR: Adaptive Selective Replication for CMP Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive insertion policies for high performance caching

Proceedings of the 34th annual international symposium on Computer architecture
Implementation of a reliable remote memory pager

ATEC '96 Proceedings of the 1996 annual conference on USENIX Annual Technical Conference
Collaborative Memory Pool in Cluster System

ICPP '07 Proceedings of the 2007 International Conference on Parallel Processing
High-performance ethernet-based communications for future multi-core processors

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Scalable high performance main memory system using phase-change memory technology

Proceedings of the 36th annual international symposium on Computer architecture
Disaggregated memory for expansion and sharing in blade servers

Proceedings of the 36th annual international symposium on Computer architecture
The multikernel: a new OS architecture for scalable multicore systems

Proceedings of the ACM SIGOPS 22nd symposium on Operating systems principles
PDRAM: a hybrid PRAM and DRAM main memory system

Proceedings of the 46th Annual Design Automation Conference
The case for RAMClouds: scalable high-performance storage entirely in DRAM

ACM SIGOPS Operating Systems Review
Page placement in hybrid memory systems

Proceedings of the international conference on Supercomputing
Evaluating placement policies for managing capacity sharing in CMP architectures with private caches

ACM Transactions on Architecture and Code Optimization (TACO)

Evaluating Dynamics and Bottlenecks of Memory Collaboration in Cluster Systems

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the fast development of highly integrated distributed systems (cluster systems), especially those encapsulated within a single platform [28, 9], designers have to face interesting memory hierarchy design choices that attempt to avoid disk storage swapping. Disk swapping activities slow down application execution drastically. Leveraging remote free memory through Memory Collaboration has demonstrated its cost-effectiveness compared to overprovisioning for peak load requirements. Recent studies propose several ways on accessing the under-utilized remote memory in static system configurations, without detailed exploration on the dynamic memory collaboration. Dynamic collaboration is an important aspect given the run-time memory usage fluctuations in clustered systems. In this paper, we propose an Autonomous Collaborative Memory System (ACMS) that manages memory resources dynamically at run time, to optimize performance, and provide QoS measures for nodes engaging in the system. We implement a prototype realizing the proposed ACMS, experiment with a wide range of real-world applications, and show up to 3x performance speedup compared to a non-collaborative memory system, without perceivable performance impact on nodes that provide memory. Based on our experiments, we conduct detailed analysis on the remote memory access overhead and provide insights for future optimizations.