Towards more efficient execution: a decoupled access-execute approach

Authors:
Konstantinos Koukos;David Black-Schaffer;Vasileios Spiliopoulos;Stefanos Kaxiras
Affiliations:
Uppsala University, Uppsala, Sweden;Uppsala University, Uppsala, Sweden;Uppsala University, Uppsala, Sweden;Uppsala University, Uppsala, Sweden
Venue:
Proceedings of the 27th international ACM conference on International conference on supercomputing
Year:
2013

Citing 8
Cited 1

The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Decoupled access/execute computer architectures

ISCA '82 Proceedings of the 9th annual symposium on Computer Architecture
Slipstream Execution Mode for CMP-Based Multiprocessors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
SPEC CPU2006 benchmark descriptions

ACM SIGARCH Computer Architecture News
Interval-based models for run-time DVFS orchestration in superscalar processors

Proceedings of the 7th ACM international conference on Computing frontiers
Inter-core prefetching for multicore processors using migrating helper threads

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Green governors: A framework for Continuously Adaptive DVFS

IGCC '11 Proceedings of the 2011 International Green Computing Conference and Workshops
Power-Sleuth: A Tool for Investigating Your Program's Power Behavior

MASCOTS '12 Proceedings of the 2012 IEEE 20th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems

Fix the code. Don't tweak the hardware: A new compiler approach to Voltage-Frequency scaling

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

The end of Dennard scaling is expected to shrink the range of DVFS in future nodes, limiting the energy savings of this technique. This paper evaluates how much we can increase the effectiveness of DVFS by using a software decoupled access-execute approach. Decoupling the data access from execution allows us to apply optimal voltage-frequency selection for each phase and therefore improve energy efficiency over standard coupled execution. The underlying insight of our work is that by decoupling access and execute we can take advantage of the memory-bound nature of the access phase and the compute-bound nature of the execute phase to optimize power efficiency, while maintaining good performance. To demonstrate this we built a task based parallel execution infrastructure consisting of: (1) a runtime system to orchestrate the execution, (2) power models to predict optimal voltage-frequency selection at runtime, (3) a modeling infrastructure based on hardware measurements to simulate zero-latency, per-core DVFS, and (4) a hardware measurement infrastructure to verify our model's accuracy. Based on real hardware measurements we project that the combination of decoupled access-execute and DVFS has the potential to improve EDP by 25% without hurting performance. On memory-bound applications we significantly improve performance due to increased MLP in the access phase and ILP in the execute phase. Furthermore we demonstrate that our method can achieve high performance both in presence or absence of a hardware prefetcher.