Optimizing Array-Intensive Applications for On-Chip Multiprocessors
IEEE Transactions on Parallel and Distributed Systems
Exploiting Barriers to Optimize Power Consumption of CMPs
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Locality-conscious workload assignment for array-based computations in MPSOC architectures
Proceedings of the 42nd annual Design Automation Conference
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence
Proceedings of the 32nd annual international symposium on Computer Architecture
Energy reduction in multiprocessor systems using transactional memory
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Power-performance considerations of parallel computing on chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
Interconnect-Aware Coherence Protocols for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Spin Detection Hardware for Improved Management of Multithreaded Systems
IEEE Transactions on Parallel and Distributed Systems
Reducing power through compiler-directed barrier synchronization elimination
Proceedings of the 2006 international symposium on Low power electronics and design
On the energy efficiency of synchronization primitives for shared-memory single-chip multiprocessors
Proceedings of the 17th ACM Great Lakes symposium on VLSI
Managing energy-performance tradeoffs for multithreaded applications on multiprocessor architectures
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A memory-conscious code parallelization scheme
Proceedings of the 44th annual Design Automation Conference
Lightweight barrier-based parallelization support for non-cache-coherent MPSoC platforms
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Broader dynamic load balancing for hybrid/multi-level parallel programming models
International Journal of High Performance Computing and Networking
Energy efficient synchronization techniques for embedded architectures
Proceedings of the 18th ACM Great Lakes symposium on VLSI
Proceedings of the 13th international symposium on Low power electronics and design
Distributed and low-power synchronization architecture for embedded multiprocessors
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Meeting points: using thread criticality to adapt multicore hardware to parallel regions
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Low-complexity policies for energy-performance tradeoff in chip-multi-processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Energy-optimal synchronization primitives for single-chip multi-processors
Proceedings of the 19th ACM Great Lakes symposium on VLSI
Adagio: making DVS practical for complex HPC applications
Proceedings of the 23rd international conference on Supercomputing
Proceedings of the 36th annual international symposium on Computer architecture
On the energy-efficiency of software transactional memory
Proceedings of the 22nd Annual Symposium on Integrated Circuits and System Design: Chip on the Dunes
ACM Transactions on Architecture and Code Optimization (TACO)
Thread criticality support in on-chip networks
Proceedings of the Third International Workshop on Network on Chip Architectures
Low-cost and energy-efficient distributed synchronization for embedded multiprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
LIME: a framework for debugging load imbalance in multi-threaded execution
Proceedings of the 33rd International Conference on Software Engineering
Scalable power control for many-core architectures running multi-threaded applications
Proceedings of the 38th annual international symposium on Computer architecture
Improving energy efficiency of multi-threaded applications using heterogeneous CMOS-TFET multicores
Proceedings of the 17th IEEE/ACM international symposium on Low-power electronics and design
Parallel application memory scheduling
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual International Symposium on Computer Architecture
Exploring hardware overprovisioning in power-constrained, high performance computing
Proceedings of the 27th international ACM conference on International conference on supercomputing
Criticality stacks: identifying critical threads in parallel programs using synchronization behavior
Proceedings of the 40th Annual International Symposium on Computer Architecture
PAIS: Parallelism-aware interconnect scheduling in multicores
ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers
Hi-index | 0.00 |
Much research has been devoted to making microprocessors energy-efficient. However, little attention has been paid to multiprocessor environments where, due to the co-operative nature of the computation, the most energy-efficient execution in each processor may not translate into the most energy-efficient overall execution. We present the thrifty barrier, a hardware-software approach to saving energy in parallel applications that exhibit barrier synchronization imbalance. Threads that arrive early to a thrifty barrier pick among existing low-power processor sleep states based on predicted barrier stall time and other factors. We leverage the coherence protocol and propose small hardware extensions to achieve timely wake-up of these dormant threads, maximizing energy savings while minimizing the impact on performance.