The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
MTCMOS hierarchical sizing based on mutual exclusive discharge patterns
DAC '98 Proceedings of the 35th annual Design Automation Conference
Power considerations in the design of the Alpha 21264 microprocessor
DAC '98 Proceedings of the 35th annual Design Automation Conference
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Energy-driven integrated hardware-software optimizations using SimplePower
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the 39th annual Design Automation Conference
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Alpha 21264 Microprocessor
IEEE Micro
Parameter variations and impact on circuits and microarchitecture
Proceedings of the 40th annual Design Automation Conference
The Design of a Register Renaming Unit
GLS '99 Proceedings of the Ninth Great Lakes Symposium on VLSI
Technology Independent Area and Delay Estimations for MicroprocessorBuilding Blocks
Technology Independent Area and Delay Estimations for MicroprocessorBuilding Blocks
Accurate pre-layout estimation of standard cell characteristics
Proceedings of the 41st annual Design Automation Conference
IBM Journal of Research and Development
Conjoined-Core Chip Multiprocessing
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Distributed sleep transistor network for power reduction
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Controlling program execution through binary instrumentation
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
The M5 Simulator: Modeling Networked Systems
IEEE Micro
An 8-core, 64-thread, 64-bit power efficient sparc soc (niagara2)
Proceedings of the 2007 international symposium on Physical design
Performance counters and development of SPEC CPU2006
ACM SIGARCH Computer Architecture News
Virtual hierarchies to support server consolidation
Proceedings of the 34th annual international symposium on Computer architecture
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Achieving 10 Gb/s using safe and transparent network interface virtualization
Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration
Proceedings of the Conference on Design, Automation and Test in Europe
Hotspot: acompact thermal modeling methodology for early-stage VLSI design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The structural simulation toolkit
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Computer Architecture, Fifth Edition: A Quantitative Approach
Computer Architecture, Fifth Edition: A Quantitative Approach
Cuckoo directory: A scalable directory for many-core systems
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
ACM SIGARCH Computer Architecture News
Proceedings of the International Conference on Computer-Aided Design
Analysis and future trend of short-circuit power
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Reducing memory access latency with asymmetric DRAM bank organizations
Proceedings of the 40th Annual International Symposium on Computer Architecture
Memory-centric system interconnect design with hybrid memory cubes
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Hi-index | 0.00 |
This article introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At microarchitectural level, McPAT includes models for the fundamental components of a complete chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, and integrated system components such as memory controllers and Ethernet controllers. At circuit level, McPAT supports detailed modeling of critical-path timing, area, and power. At technology level, McPAT models timing, area, and power for the device types forecast in the ITRS roadmap. McPAT has a flexible XML interface to facilitate its use with many performance simulators. Combined with a performance simulator, McPAT enables architects to accurately quantify the cost of new ideas and assess trade-offs of different architectures using new metrics such as Energy-Delay-Area2 Product (EDA2P) and Energy-Delay-Area Product (EDAP). This article explores the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting trade-offs between area and performance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies from cache sharing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks for manycore designs at the 22nm technology shows that 8-core clustering gives the best energy-delay product, whereas when die area is taken into account, 4-core clustering gives the best EDA2P and EDAP.