Exploring the cache design space for large scale CMPs

Authors:
Lisa Hsu;Ravi Iyer;Srihari Makineni;Steve Reinhardt;Donald Newell
Affiliations:
University of Michigan, Ann Arbor;Systems Technology Lab, Intel Corporation;Systems Technology Lab, Intel Corporation;University of Michigan, Ann Arbor;Systems Technology Lab, Intel Corporation
Venue:
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Year:
2005

Citing 9
Cited 23

Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
MemorIES: a programmable, real-time hardware emulation tool for multiprocessor server design

ACM SIGPLAN Notices
A Single-Chip Multiprocessor

Computer
Xen and the art of virtualization

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
CQoS: a framework for enabling QoS in shared caches of CMP platforms

Proceedings of the 18th annual international conference on Supercomputing
Chip Multithreading: Opportunities and Challenges

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Intel Virtualization Technology

Computer
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling

Proceedings of the 32nd annual international symposium on Computer Architecture

Exploring Large-Scale CMP Architectures Using ManySim

IEEE Micro
Adaptive set pinning: managing shared caches in chip multiprocessors

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Visions for application development on hybrid computing systems

Parallel Computing
Software-directed combined cpu/link voltage scaling fornoc-based cmps

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A practical approach of memory access parallelization to exploit multiple off-chip DDR memories

Proceedings of the 45th annual Design Automation Conference
An open-loop flow control scheme based on the accurate global information of on-chip communication

Proceedings of the conference on Design, automation and test in Europe
A compiler-directed data prefetching scheme for chip multiprocessors

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Tradeoffs in designing accelerator architectures for visual computing

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
How to simulate 1000 cores

ACM SIGARCH Computer Architecture News
Area-efficiency in CMP core design: co-optimization of microarchitecture and physical design

ACM SIGARCH Computer Architecture News
Compiler directed network-on-chip reliability enhancement for chip multiprocessors

Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Constraint-aware large-scale CMP cache design

HiPC'07 Proceedings of the 14th international conference on High performance computing
Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Area-efficient floorplans and interconnects for homogeneous multi-core architectures

International Journal of High Performance Systems Architecture
Adaptive prefetching for shared cache based chip multiprocessors

Proceedings of the Conference on Design, Automation and Test in Europe
Exploiting compression opportunities to improve SpMxV performance on shared memory systems

ACM Transactions on Architecture and Code Optimization (TACO)
Multi-CMP system with data communication on the fly

The Journal of Supercomputing
SimPL: an effective placement algorithm

Proceedings of the International Conference on Computer-Aided Design
Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis

Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
DIMSim: a rapid two-level cache simulation approach for deadline-based MPSoCs

Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs

ACM Transactions on Computer Systems (TOCS)
Studying multicore processor scaling via reuse distance analysis

Proceedings of the 40th Annual International Symposium on Computer Architecture
Taming the complexity of coordinated place and route

Proceedings of the 50th Annual Design Automation Conference

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the advent of dual-core chips in the marketplace, small-scale CMP (chip multiprocessor) architectures are becoming commonplace. We expect a continuing trend of increasing the number of cores on a die to maximize the performance/power efficiency of a single chip. We believe an era of large-scale CMPs (LCMPs) with several tens to hundreds of cores is on the way, but as of now architects have little understanding of how best to build a cache hierarchy given such a large number of cores/threads to support. With this in mind, our initial goals are to prune the cache design space for LCMPs by characterizing basic server workload behavior in such an environment.In this paper, we describe the range of methodologies that we are developing to overcome the challenges of exploring the cache design space for LCMP platforms. We then focus on employing a trace-driven approach to characterizing one key server workload (OLTP) in both a homogeneous and a heterogeneous workload environment. We study the effect of increasing threads (from 1 to 128) on a three-level cache hierarchy with emphasis on second and third level caches. We study the effect of varying sizes at these cache levels and show the effects of threads contending for cache space, the effects of prefetching instruction addresses, and the effects of inclusion. We make initial observations and conclusions about the factors on which LCMP cache hierarchy design decisions should be based and discuss future work.