The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Digital integrated circuits: a design perspective
Digital integrated circuits: a design perspective
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Power considerations in the design of the Alpha 21264 microprocessor
DAC '98 Proceedings of the 35th annual Design Automation Conference
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Alpha 21264 Microprocessor
IEEE Micro
Parameter variations and impact on circuits and microarchitecture
Proceedings of the 40th annual Design Automation Conference
Technology Independent Area and Delay Estimations for MicroprocessorBuilding Blocks
Technology Independent Area and Delay Estimations for MicroprocessorBuilding Blocks
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Interconnections in Multi-Core Architectures: Understanding Mechanisms, Overheads and Scaling
Proceedings of the 32nd annual international symposium on Computer Architecture
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Power Gating with Multiple Sleep Modes
ISQED '06 Proceedings of the 7th International Symposium on Quality Electronic Design
Controlling program execution through binary instrumentation
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
The M5 Simulator: Modeling Networked Systems
IEEE Micro
Computer Architecture, Fourth Edition: A Quantitative Approach
Computer Architecture, Fourth Edition: A Quantitative Approach
Performance counters and development of SPEC CPU2006
ACM SIGARCH Computer Architecture News
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
ORION 2.0: a fast and accurate NoC power and area model for early-stage design space exploration
Proceedings of the Conference on Design, Automation and Test in Europe
Analysis and future trend of short-circuit power
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Resistive computation: avoiding the power wall with low-leakage, STT-MRAM based computing
Proceedings of the 37th annual international symposium on Computer architecture
Cost-aware three-dimensional (3D) many-core multiprocessor design
Proceedings of the 47th Design Automation Conference
Cost-driven 3D integration with interconnect layers
Proceedings of the 47th Design Automation Conference
Fabrication cost analysis and cost-aware design space exploration for 3-D ICs
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Moneta: A High-Performance Storage Array Architecture for Next-Generation, Non-volatile Memories
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
LOFT: A High Performance Network-on-Chip Providing Quality-of-Service Support
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
The ZCache: Decoupling Ways and Associativity
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Single-Chip Heterogeneous Computing: Does the Future Include Custom Logic, FPGAs, and GPGPUs?
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Characterizing the impact of process variation on 45 nm NoC-based CMPs
Journal of Parallel and Distributed Computing
The structural simulation toolkit
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
ACM SIGMETRICS Performance Evaluation Review - Special issue on the 1st international workshop on performance modeling, benchmarking and simulation of high performance computing systems (PMBS 10)
Run-time energy management of manycore systems through reconfigurable interconnects
Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
EnerJ: approximate data types for safe and general low-power computation
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Proceedings of the 38th annual international symposium on Computer architecture
Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks
Proceedings of the 38th annual international symposium on Computer architecture
Adaptive granularity memory systems: a tradeoff between storage efficiency and throughput
Proceedings of the 38th annual international symposium on Computer architecture
Proceedings of the 38th annual international symposium on Computer architecture
A study on factors influencing power consumption in multithreaded and multicore CPUs
WSEAS Transactions on Computers
An energy-efficient adaptive hybrid cache
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
ACM SIGARCH Computer Architecture News
Proceedings of the 48th Design Automation Conference
A reuse-aware prefetching scheme for scratchpad memory
Proceedings of the 48th Design Automation Conference
Matching cache access behavior and bit error pattern for high performance low Vcc L1 cache
Proceedings of the 48th Design Automation Conference
Token3D: reducing temperature in 3d die-stacked CMPs through cycle-level power control mechanisms
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
System implications of memory reliability in exascale computing
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Assuring application-level correctness against soft errors
Proceedings of the International Conference on Computer-Aided Design
Proceedings of the International Conference on Computer-Aided Design
Improving System Energy Efficiency with Memory Rank Subsetting
ACM Transactions on Architecture and Code Optimization (TACO)
Architecture support for disciplined approximate programming
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
System-level integrated server architectures for scale-out datacenters
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A resistive TCAM accelerator for data-intensive computing
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Three-dimensional Integrated Circuits: Design, EDA, and Architecture
Foundations and Trends in Electronic Design Automation
Link-time optimization for power efficiency in a tagless instruction cache
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Design-time performance evaluation of thermal management policies for SRAM and RRAM based 3D MPSoCs
Proceedings of the great lakes symposium on VLSI
Looking back and looking forward: power, performance, and upheaval
Communications of the ACM
A limits study of benefits from nanostore-based future data-centric system architectures
Proceedings of the 9th conference on Computing Frontiers
Proceedings of the 49th Annual Design Automation Conference
Architecture support for accelerator-rich CMPs
Proceedings of the 49th Annual Design Automation Conference
Hybrid DRAM/PRAM-based main memory for single-chip CPU/GPU
Proceedings of the 49th Annual Design Automation Conference
Write performance improvement by hiding R drift latency in phase-change RAM
Proceedings of the 49th Annual Design Automation Conference
Thermal management of a many-core processor under fine-grained parallelism
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing
Instruction-based energy estimation methodology for asymmetric manycore processor simulations
Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques
SST + gem5 = a scalable simulation infrastructure for high performance computing
Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques
Proceedings of the 26th ACM international conference on Supercomputing
Simulating the future kilo-x86-64 core processors and their infrastructure
Proceedings of the 45th Annual Simulation Symposium
Thermal-aware sampling in architectural simulation
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
TAP: token-based adaptive power gating
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
BiN: a buffer-in-NUCA scheme for accelerator-rich CMPs
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
HANDS: heterogeneous architectures and networks-on-chip design and simulation
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
XIOSim: power-performance modeling of mobile x86 cores
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
MultiScale: memory system DVFS with multiple memory controllers
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Energy-efficient scheduling on heterogeneous multi-core architectures
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
CHARM: a composable heterogeneous accelerator-rich microprocessor
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Energy-efficient GPU design with reconfigurable in-package graphics memory
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Lane decoupling for improving the timing-error resiliency of wide-SIMD architectures
Proceedings of the 39th Annual International Symposium on Computer Architecture
Proceedings of the 39th Annual International Symposium on Computer Architecture
The dynamic granularity memory system
Proceedings of the 39th Annual International Symposium on Computer Architecture
Power-aware multi-core simulation for early design stage hardware/software co-optimization
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Workload and power budget partitioning for single-chip heterogeneous processors
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
When less is more (LIMO):controlled parallelism forimproved efficiency
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
Don't burn your mobile!: safe computational re-sprinting via model predictive control
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Reducing NBTI-induced processor wearout by exploiting the timing slack of instructions
Proceedings of the eighth IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
MAGE: adaptive granularity and ECC for resilient and power efficient memory systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Efficient traffic aware power management in multicore communications processors
Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
LEAP: latency- energy- and area-optimized lookup pipeline
Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
Dynamic code duplication with vulnerability awareness for soft error detection on VLIW architectures
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs
ACM Transactions on Computer Systems (TOCS)
Proceedings of the International Conference on Computer-Aided Design
CACTI-IO: CACTI with off-chip power-area-timing models
Proceedings of the International Conference on Computer-Aided Design
Architecture support for custom instructions with memory operations
Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
ACM Transactions on Architecture and Code Optimization (TACO)
Predicting Performance Impact of DVFS for Realistic Memory Systems
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
CoScale: Coordinating CPU and Memory System DVFS in Server Systems
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
MorphCore: An Energy-Efficient Microarchitecture for High Performance ILP and High Throughput TLP
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Composite Cores: Pushing Heterogeneity Into a Core
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Spatiotemporal Coherence Tracking
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Neural Acceleration for General-Purpose Approximate Programs
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
TEAPOT: a toolset for evaluating performance, power and image quality on mobile graphics systems
Proceedings of the 27th international ACM conference on International conference on supercomputing
Proceedings of the 27th international ACM conference on International conference on supercomputing
Proceedings of the 27th international ACM conference on International conference on supercomputing
RFiof: an RF approach to I/O-pin and memory controller scalability for off-chip memories
Proceedings of the ACM International Conference on Computing Frontiers
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Cherry-picking: exploiting process variations in dark-silicon homogeneous chip multi-processors
Proceedings of the Conference on Design, Automation and Test in Europe
Self-adaptive hybrid dynamic power management for many-core systems
Proceedings of the Conference on Design, Automation and Test in Europe
MALEC: a multiple access low energy cache
Proceedings of the Conference on Design, Automation and Test in Europe
Thermal-aware datapath merging for coarse-grained reconfigurable processors
Proceedings of the Conference on Design, Automation and Test in Europe
Continuous real-world inputs can open up alternative accelerator designs
Proceedings of the 40th Annual International Symposium on Computer Architecture
AC-DIMM: associative computing with STT-MRAM
Proceedings of the 40th Annual International Symposium on Computer Architecture
GPUWattch: enabling energy optimizations in GPGPUs
Proceedings of the 40th Annual International Symposium on Computer Architecture
Criticality stacks: identifying critical threads in parallel programs using synchronization behavior
Proceedings of the 40th Annual International Symposium on Computer Architecture
The locality-aware adaptive cache coherence protocol
Proceedings of the 40th Annual International Symposium on Computer Architecture
Lighting the dark silicon by exploiting heterogeneity on future processors
Proceedings of the 50th Annual Design Automation Conference
HaDeS: architectural synthesis for heterogeneous dark silicon chip multi-processors
Proceedings of the 50th Annual Design Automation Conference
VAWOM: temperature and process variation aware wearout management in 3D multicore architecture
Proceedings of the 50th Annual Design Automation Conference
Exploring the vulnerability of CMPs to soft errors with 3D stacked nonvolatile memory
ACM Journal on Emerging Technologies in Computing Systems (JETC)
A case study on the application of real phase-change RAM to main memory subsystem
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Statistical thermal modeling and optimization considering leakage power variations
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Analysis and runtime management of 3D systems with stacked DRAM for boosting energy efficiency
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
MAPG: memory access power gating
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Bloom filter-based dynamic wear leveling for phase-change RAM
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Silicon-aware distributed switch architecture for on-chip networks
Journal of Systems Architecture: the EUROMICRO Journal
Parallel frame rendering: trading responsiveness for energy on a mobile GPU
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Jigsaw: scalable software-defined caches
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Managing shared last-level cache in a heterogeneous multicore processor
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
MLP-aware dynamic instruction window resizing for adaptively exploiting both ILP and MLP
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
TLC: a tag-less cache for reducing dynamic first level cache energy
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Exploiting GPU peak-power and performance tradeoffs through reduced effective pipeline latency
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Warped gates: gating aware scheduling and power gating for GPGPUs
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Linearly compressed pages: a low-complexity, low-latency main memory compression framework
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Kiln: closing the performance gap between systems with and without persistence support
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Trace based phase prediction for tightly-coupled heterogeneous cores
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
A circuit-architecture co-optimization framework for exploring nonvolatile memory hierarchies
ACM Transactions on Architecture and Code Optimization (TACO)
ACM Transactions on Architecture and Code Optimization (TACO)
Market mechanisms for managing datacenters with heterogeneous microarchitectures
ACM Transactions on Computer Systems (TOCS)
The benefit of SMT in the multi-core era: flexibility towards degrees of thread-level parallelism
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Selecting representative benchmark inputs for exploring microprocessor design spaces
ACM Transactions on Architecture and Code Optimization (TACO)
System-level power estimation tool for embedded processor based platforms
Proceedings of the 6th Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools
Automated, retargetable back-annotation for host compiled performance and power modeling
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
EVA: an efficient vision architecture for mobile systems
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
A generalized software framework for accurate and efficient management of performance goals
Proceedings of the Eleventh ACM International Conference on Embedded Software
Agent-based distributed power management for kilo-core processors
Proceedings of the International Conference on Computer-Aided Design
Improving platform energy: chip area trade-off in near-threshold computing environment
Proceedings of the International Conference on Computer-Aided Design
Dual partitioning multicasting for high-performance on-chip networks
Journal of Parallel and Distributed Computing
Performance and power profiling for emulated Android systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Unified reliability estimation and management of NoC based chip multiprocessors
Microprocessors & Microsystems
Measuring GPU Power with the K20 Built-in Sensor
Proceedings of Workshop on General Purpose Processing Using GPUs
Hi-index | 0.02 |
This paper introduces McPAT, an integrated power, area, and timing modeling framework that supports comprehensive design space exploration for multicore and manycore processor configurations ranging from 90nm to 22nm and beyond. At the microarchitectural level, McPAT includes models for the fundamental components of a chip multiprocessor, including in-order and out-of-order processor cores, networks-on-chip, shared caches, integrated memory controllers, and multiple-domain clocking. At the circuit and technology levels, McPAT supports critical-path timing modeling, area modeling, and dynamic, short-circuit, and leakage power modeling for each of the device types forecast in the ITRS roadmap including bulk CMOS, SOI, and doublegate transistors. McPAT has a flexible XML interface to facilitate its use with many performance simulators. Combined with a performance simulator, McPAT enables architects to consistently quantify the cost of new ideas and assess tradeoffs of different architectures using new metrics like energy-delay-area2 product (EDA2P) and energy-delay-area product (EDAP). This paper explores the interconnect options of future manycore processors by varying the degree of clustering over generations of process technologies. Clustering will bring interesting tradeoffs between area and performance because the interconnects needed to group cores into clusters incur area overhead, but many applications can make good use of them due to synergies of cache sharing. Combining power, area, and timing results of McPAT with performance simulation of PARSEC benchmarks at the 22nm technology node for both common in-order and out-of-order manycore designs shows that when die cost is not taken into account clustering 8 cores together gives the best energy-delay product, whereas when cost is taken into account configuring clusters with 4 cores gives the best EDA2P and EDAP.