The McPAT Framework for Multicore and Manycore Architectures: Simultaneously Modeling Power, Area, and Timing

Authors:
Sheng Li;Jung Ho Ahn;Richard D. Strong;Jay B. Brockman;Dean M. Tullsen;Norman P. Jouppi
Affiliations:
HP Labs;Seoul National University;University of California, San Diego;University of Notre Dame;University of California, San Diego;HP Labs
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2013

Citing 41
Cited 3

The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
MTCMOS hierarchical sizing based on mutual exclusive discharge patterns

DAC '98 Proceedings of the 35th annual Design Automation Conference
Power considerations in the design of the Alpha 21264 microprocessor

DAC '98 Proceedings of the 35th annual Design Automation Conference
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Energy-driven integrated hardware-software optimizations using SimplePower

Proceedings of the 27th annual international symposium on Computer architecture
Dynamic and leakage power reduction in MTCMOS circuits using an automated efficient gate clustering technique

Proceedings of the 39th annual Design Automation Conference
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Alpha 21264 Microprocessor

IEEE Micro
The Design Space of Register Renaming Techniques

IEEE Micro
Parameter variations and impact on circuits and microarchitecture

Proceedings of the 40th annual Design Automation Conference
Hyperthreading Technology in the Netburst Microarchitecture

IEEE Micro
The Design of a Register Renaming Unit

GLS '99 Proceedings of the Ninth Great Lakes Symposium on VLSI
Technology Independent Area and Delay Estimations for MicroprocessorBuilding Blocks

Technology Independent Area and Delay Estimations for MicroprocessorBuilding Blocks
Accurate pre-layout estimation of standard cell characteristics

Proceedings of the 41st annual Design Automation Conference
New methodology for early-stage, microarchitecture-level power-performance analysis of microprocessors

IBM Journal of Research and Development
Conjoined-Core Chip Multiprocessing

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Distributed sleep transistor network for power reduction

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling

Proceedings of the 32nd annual international symposium on Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Controlling program execution through binary instrumentation

ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
The M5 Simulator: Modeling Networked Systems

IEEE Micro
An 8-core, 64-thread, 64-bit power efficient sparc soc (niagara2)

Proceedings of the 2007 international symposium on Physical design
Performance counters and development of SPEC CPU2006

ACM SIGARCH Computer Architecture News
Virtual hierarchies to support server consolidation

Proceedings of the 34th annual international symposium on Computer architecture
A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
The PARSEC benchmark suite: characterization and architectural implications

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Achieving 10 Gb/s using safe and transparent network interface virtualization

Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration

Proceedings of the Conference on Design, Automation and Test in Europe
Hotspot: acompact thermal modeling methodology for early-stage VLSI design

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The structural simulation toolkit

ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Computer Architecture, Fifth Edition: A Quantitative Approach

Computer Architecture, Fifth Edition: A Quantitative Approach
Cuckoo directory: A scalable directory for many-core systems

HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
The gem5 simulator

ACM SIGARCH Computer Architecture News
CACTI-P: architecture-level modeling for SRAM-based structures with advanced leakage reduction techniques

Proceedings of the International Conference on Computer-Aided Design
Analysis and future trend of short-circuit power

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Reducing memory access latency with asymmetric DRAM bank organizations

Proceedings of the 40th Annual International Symposium on Computer Architecture
Memory-centric system interconnect design with hybrid memory cubes

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Learning the optimal operating point for many-core systems with extended range voltage/frequency scaling

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At microarchitectural level, McPAT includes models for the fundamental components of a complete chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, and integrated system components such as memory controllers and Ethernet controllers. At circuit level, McPAT supports detailed modeling of critical-path timing, area, and power. At technology level, McPAT models timing, area, and power for the device types forecast in the ITRS roadmap. McPAT has a flexible XML interface to facilitate its use with many performance simulators. Combined with a performance simulator, McPAT enables architects to accurately quantify the cost of new ideas and assess trade-offs of different architectures using new metrics such as Energy-Delay-Area2 Product (EDA2P) and Energy-Delay-Area Product (EDAP). This article explores the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting trade-offs between area and performance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies from cache sharing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks for manycore designs at the 22nm technology shows that 8-core clustering gives the best energy-delay product, whereas when die area is taken into account, 4-core clustering gives the best EDA2P and EDAP.