Cache Invalidation Patterns in Shared-Memory Multiprocessors
IEEE Transactions on Computers
Synchronization and communication in the T3E multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
The SGI Origin: a ccNUMA highly scalable server
Proceedings of the 24th annual international symposium on Computer architecture
Disco: running commodity operating systems on scalable multiprocessors
ACM Transactions on Computer Systems (TOCS)
The directory-based cache coherence protocol for the DASH multiprocessor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Reconfigurable caches and their application to media processing
Proceedings of the 27th annual international symposium on Computer architecture
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Starfire: Extending the SMP Envelope
IEEE Micro
WildFire: A Scalable Path for SMPs
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Variability in Architectural Simulations of Multi-Threaded Workloads
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Token coherence: decoupling performance and correctness
Proceedings of the 30th annual international symposium on Computer architecture
A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Dynamic Partitioning of Shared Cache Memory
The Journal of Supercomputing
CQoS: a framework for enabling QoS in shared caches of CMP platforms
Proceedings of the 18th annual international conference on Supercomputing
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Improving Multiple-CMP Systems Using Token Coherence
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Memory resource management in VMware ESX server
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors
Proceedings of the 32nd annual international symposium on Computer Architecture
Optimizing Replication, Communication, and Capacity Allocation in CMPs
Proceedings of the 32nd annual international symposium on Computer Architecture
Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
A NUCA substrate for flexible CMP cache sharing
Proceedings of the 19th annual international conference on Supercomputing
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Cooperative Caching for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Architectural support for operating system-driven CMP cache management
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Coherence Ordering for Ring-based Chip Multiprocessors
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ASR: Adaptive Selective Replication for CMP Caches
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
POWER4 system microarchitecture
IBM Journal of Research and Development
Characterization of Apache web server with Specweb2005
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Characterization & analysis of a server consolidation benchmark
Proceedings of the fourth ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Utilizing shared data in chip multiprocessors with the Nahalal architecture
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
A consistency architecture for hierarchical shared caches
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Towards modeling & analysis of consolidated CMP servers
ACM SIGARCH Computer Architecture News
Making secure processors OS- and performance-friendly
ACM Transactions on Architecture and Code Optimization (TACO)
ACM: An Efficient Approach for Managing Shared Caches in Chip Multiprocessors
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Mixed-mode multicore reliability
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Token tenure: PATCHing token counting using directory-based cache coherence
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Implementing high availability memory with a duplication cache
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Efficient unicast and multicast support for CMPs
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Push-assisted migration of real-time tasks in multi-core processors
Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Reactive NUCA: near-optimal block placement and replication in distributed caches
Proceedings of the 36th annual international symposium on Computer architecture
On-Line Multiple-Strip Packing
COCOA '09 Proceedings of the 3rd International Conference on Combinatorial Optimization and Applications
Resource pool management: Reactive versus proactive or let's be friends
Computer Networks: The International Journal of Computer and Telecommunications Networking
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
A scalable organization for distributed directories
Journal of Systems Architecture: the EUROMICRO Journal
qTLB: looking inside the look-aside buffer
HiPC'07 Proceedings of the 14th international conference on High performance computing
Token tenure and PATCH: A predictive/adaptive token-counting hybrid
ACM Transactions on Architecture and Code Optimization (TACO)
WAYPOINT: scaling coherence to thousand-core architectures
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Latency criticality aware on-chip communication
Proceedings of the Conference on Design, Automation and Test in Europe
Theoretical Computer Science
Virtual Snooping: Filtering Snoops in Virtualized Multi-cores
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Fractal Coherence: Scalably Verifiable Cache Coherence
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Virtualizing network-on-chip resources in chip-multiprocessors
Microprocessors & Microsystems
Research note: C-AMTE: A location mechanism for flexible cache management in chip multiprocessors
Journal of Parallel and Distributed Computing
Efficient dynamic task scheduling in virtualized data centers with fuzzy prediction
Journal of Network and Computer Applications
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks
Proceedings of the 38th annual international symposium on Computer architecture
Kilo-NOC: a heterogeneous network-on-chip architecture for scalability and service guarantees
Proceedings of the 38th annual international symposium on Computer architecture
Scalable power control for many-core architectures running multi-threaded applications
Proceedings of the 38th annual international symposium on Computer architecture
Understanding scheduling implications for scientific applications in clouds
Proceedings of the 9th International Workshop on Middleware for Grids, Clouds and e-Science
Switch-based packing technique to reduce traffic and latency in token coherence
Journal of Parallel and Distributed Computing
Manager-client pairing: a framework for implementing coherence hierarchies
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Topology-Aware quality-of-service support in highly integrated chip multiprocessors
ISCA'10 Proceedings of the 2010 international conference on Computer Architecture
Network-on-Chip virtualization in Chip-Multiprocessor Systems
Journal of Systems Architecture: the EUROMICRO Journal
An optimized multicore cache coherence design for exploiting communication locality
Proceedings of the great lakes symposium on VLSI
Measuring interference between live datacenter applications
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Moths: Mobile threads for on-chip networks
ACM Transactions on Embedded Computing Systems (TECS) - Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems
ACM Transactions on Architecture and Code Optimization (TACO)
Virtualizing power distribution in datacenters
Proceedings of the 40th Annual International Symposium on Computer Architecture
RISO: relaxed network-on-chip isolation for cloud processors
Proceedings of the 50th Annual Design Automation Conference
Dynamic directories: a mechanism for reducing on-chip interconnect power in multicores
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Power-efficient calibration and reconfiguration for on-chip optical communication
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Jigsaw: scalable software-defined caches
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Building expressive, area-efficient coherence directories
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
The sharing architecture: sub-core configurability for IaaS clouds
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Underprovisioning backup power infrastructure for datacenters
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
High-performance fractal coherence
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Server consolidation is becoming an increasingly populartechnique to manage and utilize systems. This paper develops CMPmemory systems for server consolidation where most sharing occurswithin Virtual Machines (VMs). Our memory systems maximize sharedmemory accesses serviced within a VM, minimize interference amongseparate VMs, facilitate dynamic reassignment of VMs to processorsand memory, and support content-based page sharing among VMs. Webegin with a tiled architecture where each of 64 tiles contains aprocessor, private L1 caches, and an L2 bank. First, we reveal whysingle-level directory designs fail to meet workload consolidationgoals. Second, we develop the paper's central idea of imposing atwo-level virtual (or logical) coherence hierarchy on a physicallyflat CMP that harmonizes with VM assignment. Third, we show thatthe best of our two virtual hierarchy (VH) variants performs 12-58%better than the best alternative flat directory protocol whenconsolidating Apache, OLTP, and Zeus commel workloads on oursimulated 64-core CMP.