Global instruction scheduling for superscalar machines
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Sentinel scheduling: a model for compiler-controlled speculative execution
ACM Transactions on Computer Systems (TOCS)
The multiflow trace scheduling compiler
The Journal of Supercomputing - Special issue on instruction-level parallelism
A heuristic for global code motion
ICYCS'93 Proceedings of the third international conference on Young computer scientists
Enhancing instruction level parallelism through compiler-controlled speculation
Enhancing instruction level parallelism through compiler-controlled speculation
The SimpleScalar tool set, version 2.0
ACM SIGARCH Computer Architecture News
Comparing Tail Duplication with Compensation Code in Single Path Global Instruction Scheduling
CC '01 Proceedings of the 10th International Conference on Compiler Construction
Swing Modulo Scheduling: A Lifetime-Sensitive Approach
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
On the Design Complexity of the Issue Logic of Superscalar Machines
EUROMICRO '98 Proceedings of the 24th Conference on EUROMICRO - Volume 1
Design of a Computer—The Control Data 6600
Design of a Computer—The Control Data 6600
Trace Scheduling: A Technique for Global Microcode Compaction
IEEE Transactions on Computers
Parallel operation in the control data 6600
AFIPS '64 (Fall, part II) Proceedings of the October 27-29, 1964, fall joint computer conference, part II: very high speed computer systems
An efficient algorithm for exploiting multiple arithmetic units
IBM Journal of Research and Development
Reconfigurable Grid Alu Processor: Optimization and Design Space Exploration
DSD '10 Proceedings of the 2010 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools
Hi-index | 0.00 |
In this paper we propose and evaluate a post-link-optimization to increase instruction level parallelism by moving instructions from one basic block to the preceding blocks. The Grid Alu Processor used for the evaluations comprises plenty of functional units that are not completely allocated by the original instruction stream. The proposed technique speculatively performs operations in advance by using unallocated functional units. The algorithm moves instructions to multiple predecessors of a source block. If necessary, it adds compensation code to allow the shifted instructions to work on unused registers, whose values will be copied into the original target registers at the time the speculation is resolved. Evaluations of the algorithm show a maximum speedup of factor 2.08 achieved on the Grid Alu Processor compared to the unoptimized version of the same program due to a better exploitation of the ILP and an optimized mapping of loops.