Constant propagation with conditional branches
ACM Transactions on Programming Languages and Systems (TOPLAS)
Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A linear time algorithm for placing &phgr;-nodes
POPL '95 Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Linear scan register allocation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Journal of Algorithms
Complete register allocation problems
STOC '73 Proceedings of the fifth annual ACM symposium on Theory of computing
The history of FORTRAN I, II, and III
ACM SIGPLAN Notices - Special issue: History of programming languages conference
Compilers: Principles, Techniques, and Tools (2nd Edition)
Compilers: Principles, Techniques, and Tools (2nd Edition)
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Parallel Computing Experiences with CUDA
IEEE Micro
A control-structure splitting optimization for GPGPU
Proceedings of the 6th ACM conference on Computing frontiers
A study of replacement algorithms for a virtual-storage computer
IBM Systems Journal
Rodinia: A benchmark suite for heterogeneous computing
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
An adaptive performance modeling tool for GPU architectures
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
IEEE Micro
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Proceedings of the 37th annual international symposium on Computer architecture
Understanding throughput-oriented architectures
Communications of the ACM
Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
On-the-fly elimination of dynamic irregularities for GPU computing
Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Reducing branch divergence in GPU programs
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Register allocation after classical SSA elimination is NP-Complete
FOSSACS'06 Proceedings of the 9th European joint conference on Foundations of Software Science and Computation Structures
Divergence Analysis and Optimizations
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Register allocation via coloring
Computer Languages
ACM Transactions on Programming Languages and Systems (TOPLAS)
Hi-index | 0.00 |
The Single Instruction, Multiple Data (SIMD) execution model has been receiving renewed attention recently. This awareness stems from the rise of graphics processing units (GPUs) as a powerful alternative for parallel computing. Many compiler optimizations have been recently proposed for this hardware, but register allocation is a field yet to be explored. In this context, this paper describes a register spiller for SIMD machines that capitalizes on the opportunity to share identical data between threads. It provides two different benefits: first, it uses less memory, as more spilled values are shared among threads. Second, it improves the access times to spilled values. We have implemented our proposed allocator in the Ocelot open source compiler, and have been able to speedup the code produced by this framework by 21%. Although we have designed our algorithm on top of a linear scan register allocator, we claim that our ideas can be easily adapted to fit the necessities of other register allocators.