Feedback-Based global instruction scheduling for GPGPU applications

Authors:
Constantin Timm;Markus Görlich;Frank Weichert;Peter Marwedel;Heinrich Müller
Affiliations:
Computer Science 12, TU Dortmund, Germany;Computer Science 12, TU Dortmund, Germany;Computer Science 7, TU Dortmund, Germany;Computer Science 12, TU Dortmund, Germany;Computer Science 7, TU Dortmund, Germany
Venue:
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I
Year:
2012

Citing 14
Cited 0

Balanced scheduling: instruction scheduling when memory latency is uncertain

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
VLSI and Modern Signal Processing

VLSI and Modern Signal Processing
Treegion Scheduling for Highly Parallel Processors

Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Instruction Scheduling for Clustered VLIW DSPs

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Energy-aware variable partitioning and instruction scheduling for multibank memory architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Corollaries to Amdahl's Law for Energy

IEEE Computer Architecture Letters
Extending Amdahl's Law for Energy-Efficient Computing in the Many-Core Era

Computer
High-Performance Embedded Architecture and Compilation Roadmap

Transactions on High-Performance Embedded Architectures and Compilers I
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness

Proceedings of the 36th annual international symposium on Computer architecture
Rodinia: A benchmark suite for heterogeneous computing

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Energy-aware high performance computing with graphic processing units

HotPower'08 Proceedings of the 2008 conference on Power aware computing and systems
Reducing branch divergence in GPU programs

Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Automated Synthesis of Data Paths in Digital Systems

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Characterization of Pareto dominance

Operations Research Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

In the face of the memory wall even in high bandwidth systems such as GPUs, an efficient handling of memory accesses and memory-related instructions is mandatory. Up to now, memory performance considerations were only made for GPGPU applications at source code level. This is not enough when optimizing an application towards high performance: The code has to be optimized at assembly level as well. Due to the spreading of GPGPU-capable hardware in smaller and smaller devices, the energy consumption of a program is --- besides the performance --- an important optimization goal. In this paper, a novel compiler optimization technique, called FALIS (F eedback-based and memory-A ware gL obal I nstruction S cheduling), is presented based on global instruction scheduling and multi-objective genetic algorithms. The approach uses a profiling-based feedback in order to take the measured performance and energy consumption values inside a compiler into account. Profiling on the real hardware platform is important in order to consider the characteristics of the underlying hardware. FALIS increases runtime performance of a GPGPU application by up to 13.02% and decreases energy consumption by up to 10.23%.