Balanced scheduling: instruction scheduling when memory latency is uncertain
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
VLSI and Modern Signal Processing
VLSI and Modern Signal Processing
Treegion Scheduling for Highly Parallel Processors
Euro-Par '97 Proceedings of the Third International Euro-Par Conference on Parallel Processing
Instruction Scheduling for Clustered VLIW DSPs
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Energy-aware variable partitioning and instruction scheduling for multibank memory architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Corollaries to Amdahl's Law for Energy
IEEE Computer Architecture Letters
High-Performance Embedded Architecture and Compilation Roadmap
Transactions on High-Performance Embedded Architectures and Compilers I
An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness
Proceedings of the 36th annual international symposium on Computer architecture
Rodinia: A benchmark suite for heterogeneous computing
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Energy-aware high performance computing with graphic processing units
HotPower'08 Proceedings of the 2008 conference on Power aware computing and systems
Reducing branch divergence in GPU programs
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Automated Synthesis of Data Paths in Digital Systems
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Characterization of Pareto dominance
Operations Research Letters
Hi-index | 0.00 |
In the face of the memory wall even in high bandwidth systems such as GPUs, an efficient handling of memory accesses and memory-related instructions is mandatory. Up to now, memory performance considerations were only made for GPGPU applications at source code level. This is not enough when optimizing an application towards high performance: The code has to be optimized at assembly level as well. Due to the spreading of GPGPU-capable hardware in smaller and smaller devices, the energy consumption of a program is --- besides the performance --- an important optimization goal. In this paper, a novel compiler optimization technique, called FALIS (F eedback-based and memory-A ware gL obal I nstruction S cheduling), is presented based on global instruction scheduling and multi-objective genetic algorithms. The approach uses a profiling-based feedback in order to take the measured performance and energy consumption values inside a compiler into account. Profiling on the real hardware platform is important in order to consider the characteristics of the underlying hardware. FALIS increases runtime performance of a GPGPU application by up to 13.02% and decreases energy consumption by up to 10.23%.