Time/space trade-offs for reversible computation
SIAM Journal on Computing
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Register allocation via graph coloring
Register allocation via graph coloring
Efficient optimistic parallel simulations using reverse computation
PADS '99 Proceedings of the thirteenth workshop on Parallel and distributed simulation
IEEE Transactions on Computers
NREVERSAL of Fortune - The Thermodynamics of Garbage Collection
IWMM '92 Proceedings of the International Workshop on Memory Management
URSA: A Unified ReSource Allocator for Registers and Functional Units in VLIW Architectures
PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Register allocation & spilling via graph coloring
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Using Reversible Computing To Achieve Fail-Safety
ISSRE '97 Proceedings of the Eighth International Symposium on Software Reliability Engineering
Value-driven redundancy elimination
Value-driven redundancy elimination
Compiler Optimizations to Reduce Security Overhead
Proceedings of the International Symposium on Code Generation and Optimization
Logical reversibility of computation
IBM Journal of Research and Development
Rematerialization-based register allocation through reverse computing
Proceedings of the 8th ACM International Conference on Computing Frontiers
Hi-index | 0.00 |
Reversible computing aims at keeping all information on input and intermediate values available at any step of the computation, making information virtually present everywhere. Rematerialization in register allocation amounts to recomputing values instead of spilling them in memory when registers run out. In this paper we detail a heuristic algorithm for exploiting reverse computing for register materialization. This improves information locality as it provides more opportunities for retrieving data. Rematerialization adds instructions and we show on one specifically designed example that reverse computing may alleviate the impact of these additional instructions on performance. We also show how thread parallelism may be optimized on GPUs by performing register allocation with reverse recomputing that increases the number of threads per Streaming Multiprocessor. This is done on the main kernel of Lattice Quantum Chromo Dynamics simulation program where we gain a 11 % speedup.