Evaluation of design alternatives for a multiprocessor microprocessor
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The case for a single-chip multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Prefetching Using Markov Predictors
IEEE Transactions on Computers - Special issue on cache memory and related problems
Lowering power consumption in clock by using globally asynchronous locally synchronous design style
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
A Chip-Multiprocessor Architecture with Speculative Multithreading
IEEE Transactions on Computers
Piranha: a scalable architecture based on single-chip multiprocessing
Proceedings of the 27th annual international symposium on Computer architecture
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
OpenMP: An Industry-Standard API for Shared-Memory Programming
IEEE Computational Science & Engineering
SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance
WOMPAT '01 Proceedings of the International Workshop on OpenMP Applications and Tools: OpenMP Shared Memory Parallel Programming
Power efficiency of voltage scaling in multiple clock, multiple voltage cores
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Dynamic frequency and voltage control for a multiple clock domain microarchitecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
The energy efficiency of CMP vs. SMT for multimedia workloads
Proceedings of the 18th annual international conference on Supercomputing
The Thrifty Barrier: Energy-Aware Synchronization in Shared-Memory Multiprocessors
HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
A clock power model to evaluate impact of architectural and technology optimizations
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Reducing power through compiler-directed barrier synchronization elimination
Proceedings of the 2006 international symposium on Low power electronics and design
Online power-performance adaptation of multithreaded programs using hardware event-based prediction
Proceedings of the 20th annual international conference on Supercomputing
Improving fairness, throughput and energy-efficiency on a chip multiprocessor through DVFS
ACM SIGARCH Computer Architecture News
Lightweight barrier-based parallelization support for non-cache-coherent MPSoC platforms
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Efficiency and scalability of barrier synchronization on NoC based many-core architectures
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Meeting points: using thread criticality to adapt multicore hardware to parallel regions
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Prediction models for multi-dimensional power-performance optimization on many cores
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Low-complexity policies for energy-performance tradeoff in chip-multi-processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Energy-optimal synchronization primitives for single-chip multi-processors
Proceedings of the 19th ACM Great Lakes symposium on VLSI
Adagio: making DVS practical for complex HPC applications
Proceedings of the 23rd international conference on Supercomputing
Proceedings of the 36th annual international symposium on Computer architecture
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
ACM Transactions on Architecture and Code Optimization (TACO)
Scalable power control for many-core architectures running multi-threaded applications
Proceedings of the 38th annual international symposium on Computer architecture
Improving energy efficiency of multi-threaded applications using heterogeneous CMOS-TFET multicores
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Parallel application memory scheduling
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Criticality stacks: identifying critical threads in parallel programs using synchronization behavior
Proceedings of the 40th Annual International Symposium on Computer Architecture
PAIS: Parallelism-aware interconnect scheduling in multicores
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.00 |
Power consumption is an important concern for future billion transistor designs. This paper proposes a novel technique for optimizing the power consumption of chip-multiprocessors (CMPs) using an integrated hardware-software mechanism. By using a high level synchronization construct, called the barrier, our technique tracks the idle times spent by a processor waiting for other processors to get to the same point in the program. Using this knowledge, the frequency of the processors can be modulated to reduce/eliminate these idle times, thus providing power savings without compromising on performance. Using real applications from the SpecOMP suite, and a complete system CMP simulator, we demonstrate that this approach can provide as much as 40% power savings (and 32% on the average across five applications) with little impact on performance.