Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Computer
Xen and the art of virtualization
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
CQoS: a framework for enabling QoS in shared caches of CMP platforms
Proceedings of the 18th annual international conference on Supercomputing
Chip Multithreading: Opportunities and Challenges
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Intel Virtualization Technology
Computer
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
Adaptive set pinning: managing shared caches in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
Visions for application development on hybrid computing systems
Parallel Computing
Software-directed combined cpu/link voltage scaling fornoc-based cmps
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A practical approach of memory access parallelization to exploit multiple off-chip DDR memories
Proceedings of the 45th annual Design Automation Conference
An open-loop flow control scheme based on the accurate global information of on-chip communication
Proceedings of the conference on Design, automation and test in Europe
A compiler-directed data prefetching scheme for chip multiprocessors
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Tradeoffs in designing accelerator architectures for visual computing
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
ACM SIGARCH Computer Architecture News
Area-efficiency in CMP core design: co-optimization of microarchitecture and physical design
ACM SIGARCH Computer Architecture News
Compiler directed network-on-chip reliability enhancement for chip multiprocessors
Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
Constraint-aware large-scale CMP cache design
HiPC'07 Proceedings of the 14th international conference on High performance computing
Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Area-efficient floorplans and interconnects for homogeneous multi-core architectures
International Journal of High Performance Systems Architecture
Adaptive prefetching for shared cache based chip multiprocessors
Proceedings of the Conference on Design, Automation and Test in Europe
Exploiting compression opportunities to improve SpMxV performance on shared memory systems
ACM Transactions on Architecture and Code Optimization (TACO)
Multi-CMP system with data communication on the fly
The Journal of Supercomputing
SimPL: an effective placement algorithm
Proceedings of the International Conference on Computer-Aided Design
Proceedings of the 2012 ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
DIMSim: a rapid two-level cache simulation approach for deadline-based MPSoCs
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs
ACM Transactions on Computer Systems (TOCS)
Studying multicore processor scaling via reuse distance analysis
Proceedings of the 40th Annual International Symposium on Computer Architecture
Taming the complexity of coordinated place and route
Proceedings of the 50th Annual Design Automation Conference
Hi-index | 0.00 |
With the advent of dual-core chips in the marketplace, small-scale CMP (chip multiprocessor) architectures are becoming commonplace. We expect a continuing trend of increasing the number of cores on a die to maximize the performance/power efficiency of a single chip. We believe an era of large-scale CMPs (LCMPs) with several tens to hundreds of cores is on the way, but as of now architects have little understanding of how best to build a cache hierarchy given such a large number of cores/threads to support. With this in mind, our initial goals are to prune the cache design space for LCMPs by characterizing basic server workload behavior in such an environment.In this paper, we describe the range of methodologies that we are developing to overcome the challenges of exploring the cache design space for LCMP platforms. We then focus on employing a trace-driven approach to characterizing one key server workload (OLTP) in both a homogeneous and a heterogeneous workload environment. We study the effect of increasing threads (from 1 to 128) on a three-level cache hierarchy with emphasis on second and third level caches. We study the effect of varying sizes at these cache levels and show the effects of threads contending for cache space, the effects of prefetching instruction addresses, and the effects of inclusion. We make initial observations and conclusions about the factors on which LCMP cache hierarchy design decisions should be based and discuss future work.