Virtual hierarchies to support server consolidation

Authors:
Michael R. Marty;Mark D. Hill
Affiliations:
University of Wisconsin-Madison, Madison, WI;University of Wisconsin-Madison, Madison, WI
Venue:
Proceedings of the 34th annual international symposium on Computer architecture
Year:
2007

Citing 34
Cited 51

Cache Invalidation Patterns in Shared-Memory Multiprocessors

IEEE Transactions on Computers
Synchronization and communication in the T3E multiprocessor

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The SGI Origin: a ccNUMA highly scalable server

Proceedings of the 24th annual international symposium on Computer architecture
Disco: running commodity operating systems on scalable multiprocessors

ACM Transactions on Computer Systems (TOCS)
The directory-based cache coherence protocol for the DASH multiprocessor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Reconfigurable caches and their application to media processing

Proceedings of the 27th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing

Proceedings of the 27th annual international symposium on Computer architecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Starfire: Extending the SMP Envelope

IEEE Micro
WildFire: A Scalable Path for SMPs

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Variability in Architectural Simulations of Multi-Threaded Workloads

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Token coherence: decoupling performance and correctness

Proceedings of the 30th annual international symposium on Computer architecture
A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Dynamic Partitioning of Shared Cache Memory

The Journal of Supercomputing
CQoS: a framework for enabling QoS in shared caches of CMP platforms

Proceedings of the 18th annual international conference on Supercomputing
Dynamic reconfiguration: Basic building blocks for autonomic computing on IBM pSeries servers

IBM Systems Journal
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Managing Wire Delay in Large Chip-Multiprocessor Caches

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Improving Multiple-CMP Systems Using Token Coherence

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Memory resource management in VMware ESX server

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors

Proceedings of the 32nd annual international symposium on Computer Architecture
Optimizing Replication, Communication, and Capacity Allocation in CMPs

Proceedings of the 32nd annual international symposium on Computer Architecture
Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
A NUCA substrate for flexible CMP cache sharing

Proceedings of the 19th annual international conference on Supercomputing
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Cooperative Caching for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Architectural support for operating system-driven CMP cache management

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
IPC Considered Harmful for Multiprocessor Workloads

IEEE Micro
Coherence Ordering for Ring-based Chip Multiprocessors

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Molecular Caches: A caching structure for dynamic creation of application-specific Heterogeneous cache regions

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ASR: Adaptive Selective Replication for CMP Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
POWER4 system microarchitecture

IBM Journal of Research and Development

Characterization of Apache web server with Specweb2005

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Characterization & analysis of a server consolidation benchmark

Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Utilizing shared data in chip multiprocessors with the Nahalal architecture

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
A consistency architecture for hierarchical shared caches

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Towards modeling & analysis of consolidated CMP servers

ACM SIGARCH Computer Architecture News
Making secure processors OS- and performance-friendly

ACM Transactions on Architecture and Code Optimization (TACO)
ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Mixed-mode multicore reliability

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Token tenure: PATCHing token counting using directory-based cache coherence

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Implementing high availability memory with a duplication cache

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Efficient unicast and multicast support for CMPs

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Push-assisted migration of real-time tasks in multi-core processors

Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Reactive NUCA: near-optimal block placement and replication in distributed caches

Proceedings of the 36th annual international symposium on Computer architecture
On-Line Multiple-Strip Packing

COCOA '09 Proceedings of the 3rd International Conference on Combinatorial Optimization and Applications
Resource pool management: Reactive versus proactive or let's be friends

Computer Networks: The International Journal of Computer and Telecommunications Networking
A tagless coherence directory

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A scalable organization for distributed directories

Journal of Systems Architecture: the EUROMICRO Journal
qTLB: looking inside the look-aside buffer

HiPC'07 Proceedings of the 14th international conference on High performance computing
Token tenure and PATCH: A predictive/adaptive token-counting hybrid

ACM Transactions on Architecture and Code Optimization (TACO)
WAYPOINT: scaling coherence to thousand-core architectures

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Latency criticality aware on-chip communication

Proceedings of the Conference on Design, Automation and Test in Europe
Online multiple-strip packing

Theoretical Computer Science
Virtual Snooping: Filtering Snoops in Virtualized Multi-cores

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Fractal Coherence: Scalably Verifiable Cache Coherence

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Virtualizing network-on-chip resources in chip-multiprocessors

Microprocessors & Microsystems
Research note: C-AMTE: A location mechanism for flexible cache management in chip multiprocessors

Journal of Parallel and Distributed Computing
Efficient dynamic task scheduling in virtualized data centers with fuzzy prediction

Journal of Network and Computer Applications
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks

Proceedings of the 38th annual international symposium on Computer architecture
Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees

Proceedings of the 38th annual international symposium on Computer architecture
Scalable power control for many-core architectures running multi-threaded applications

Proceedings of the 38th annual international symposium on Computer architecture
Understanding scheduling implications for scientific applications in clouds

Proceedings of the 9th International Workshop on Middleware for Grids, Clouds and e-Science
Switch-based packing technique to reduce traffic and latency in token coherence

Journal of Parallel and Distributed Computing
Manager-client pairing: a framework for implementing coherence hierarchies

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Topology-Aware quality-of-service support in highly integrated chip multiprocessors

ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Network-on-Chip virtualization in Chip-Multiprocessor Systems

Journal of Systems Architecture: the EUROMICRO Journal
An optimized multicore cache coherence design for exploiting communication locality

Proceedings of the great lakes symposium on VLSI
Measuring interference between live datacenter applications

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Moths: Mobile threads for on-chip networks

ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing

ACM Transactions on Architecture and Code Optimization (TACO)
Virtualizing power distribution in datacenters

Proceedings of the 40th Annual International Symposium on Computer Architecture
RISO: relaxed network-on-chip isolation for cloud processors

Proceedings of the 50th Annual Design Automation Conference
Dynamic directories: a mechanism for reducing on-chip interconnect power in multicores

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Power-efficient calibration and reconfiguration for on-chip optical communication

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Jigsaw: scalable software-defined caches

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Building expressive, area-efficient coherence directories

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
The sharing architecture: sub-core configurability for IaaS clouds

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Underprovisioning backup power infrastructure for datacenters

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
High-performance fractal coherence

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Server consolidation is becoming an increasingly populartechnique to manage and utilize systems. This paper develops CMPmemory systems for server consolidation where most sharing occurswithin Virtual Machines (VMs). Our memory systems maximize sharedmemory accesses serviced within a VM, minimize interference amongseparate VMs, facilitate dynamic reassignment of VMs to processorsand memory, and support content-based page sharing among VMs. Webegin with a tiled architecture where each of 64 tiles contains aprocessor, private L1 caches, and an L2 bank. First, we reveal whysingle-level directory designs fail to meet workload consolidationgoals. Second, we develop the paper's central idea of imposing atwo-level virtual (or logical) coherence hierarchy on a physicallyflat CMP that harmonizes with VM assignment. Third, we show thatthe best of our two virtual hierarchy (VH) variants performs 12-58%better than the best alternative flat directory protocol whenconsolidating Apache, OLTP, and Zeus commel workloads on oursimulated 64-core CMP.