Dynamic on-chip memory management for chip multiprocessors
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Managing Wire Delay in Large Chip-Multiprocessor Caches
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Scheduling Algorithms for Effective Thread Pairing on Hybrid Multiprocessors
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Optimizing Replication, Communication, and Capacity Allocation in CMPs
Proceedings of the 32nd annual international symposium on Computer Architecture
Fast and fair: data-stream quality of service
Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
A NUCA substrate for flexible CMP cache sharing
Proceedings of the 19th annual international conference on Supercomputing
Cooperative Caching for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Architectural support for operating system-driven CMP cache management
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
A flexible data to L2 cache mapping approach for future multicore processors
Proceedings of the 2006 workshop on Memory system performance and correctness
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ASR: Adaptive Selective Replication for CMP Caches
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Managing Distributed, Shared L2 Caches through OS-Level Page Allocation
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
CMP cache performance projection: accessibility vs. capacity
ACM SIGARCH Computer Architecture News
Virtual hierarchies to support server consolidation
Proceedings of the 34th annual international symposium on Computer architecture
Cooperative cache partitioning for chip multiprocessors
Proceedings of the 21st annual international conference on Supercomputing
Adaptive set pinning: managing shared caches in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Towards hybrid last level caches for chip-multiprocessors
ACM SIGARCH Computer Architecture News
The Journal of Supercomputing
CMP Cache Architecture and the OpenMP Performance
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Performance advantage of reconfigurable cache design on multicore processor systems
International Journal of Parallel Programming
RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Dynamic cache clustering for chip multiprocessors
Proceedings of the 23rd international conference on Supercomputing
Reactive NUCA: near-optimal block placement and replication in distributed caches
Proceedings of the 36th annual international symposium on Computer architecture
L1 Collective Cache: Managing Shared Data for Chip Multiprocessors
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Adaptive L2 cache for chip multiprocessors
Euro-Par'07 Proceedings of the 2007 conference on Parallel processing
Cache topology aware computation mapping for multicores
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Software data spreading: leveraging distributed caches to improve single thread performance
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 37th annual international symposium on Computer architecture
Off-chip memory bandwidth minimization through cache partitioning for multi-core platforms
Proceedings of the 47th Design Automation Conference
Online cache modeling for commodity multicore processors
ACM SIGOPS Operating Systems Review
ULCC: a user-level facility for optimizing shared cache performance on multicores
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Characterizing the impact of process variation on 45 nm NoC-based CMPs
Journal of Parallel and Distributed Computing
Proceedings of the 2nd ACM Symposium on Cloud Computing
Why nothing matters: the impact of zeroing
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
DAPSCO: Distance-aware partially shared cache organization
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Memory subsystem characterization in a 16-core snoop-based chip-multiprocessor architecture
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
A Model Checking Based Approach to Bounding Worst-Case Execution Time for Multicore Processors
ACM Transactions on Embedded Computing Systems (TECS) - Special Section on CAPA'09, Special Section on WHS'09, and Special Section VCPSS' 09
Survey of scheduling techniques for addressing shared resources in multicore processors
ACM Computing Surveys (CSUR)
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
Silicon-aware distributed switch architecture for on-chip networks
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
The last line of defense in the cache hierarchy before going to off-chip memory is very critical in chip multiprocessors (CMPs) from both the performance and power perspectives. This paper investigates different organizations for this last line of defense (assumed to be L2 in this paper) towards reducing off-chip memory accesses. We evaluate the trade-offs between private L2 and address-interleaved shared L2 designs, noting their individual benefits and drawbacks. The possible imbalance between the L2 demands across the CPUs favors a shared L2 organization, while the interference between these demands can favor a private L2 organization. We propose a new architecture, called Shared Processor-Based Split L2, that captures the benefits of these two organizations, while avoiding many of their drawbacks. Using several applications from the SPEC OMP suite and a commercial benchmark, Specjbb, on a complete system simulator, we demonstrate the benefits of this shared processor-based L2 organization. Our results show as much as 42.50% improvement in IPC over the private organization (with 11.52% on the average), and as much as 42.22% improvement over the shared interleaved organization (with 9.76% on the average).