Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Selected papers of the second workshop on Languages and compilers for parallel computing
A compilation technique for software pipelining of loops with conditional jumps
MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
Structure of Computers and Computations
Structure of Computers and Computations
Percolation Scheduling: A Parallel Compilation Technique
Percolation Scheduling: A Parallel Compilation Technique
Software pipelining: an evaluation of enhanced pipelining
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
A dynamic-programming technique for compacting loops
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Compiler code transformations for superscalar-based high performance systems
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
VLIW compilation techniques in a superscalar environment
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Instruction scheduling in the TOBEY compiler
IBM Journal of Research and Development
ACM Computing Surveys (CSUR)
Resource-Constrained Software Pipelining
IEEE Transactions on Parallel and Distributed Systems
Unrolling-based optimizations for modulo scheduling
Proceedings of the 28th annual international symposium on Microarchitecture
Software pipelining: a comparison and improvement
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Using a lookahead window in a compaction-based parallelizing compiler
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
A study on the number of memory ports in multiple instruction issue machines
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Parallelizing nonnumerical code with selective scheduling and software pipelining
ACM Transactions on Programming Languages and Systems (TOPLAS)
Optimizations and oracle parallelism with dynamic translation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Making Compaction-Based Parallelization Affordable
IEEE Transactions on Parallel and Distributed Systems
Software Pipelining: Petri Net Pacemaker
PACT '93 Proceedings of the IFIP WG10.3. Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism
Instruction combining for coalescing memory accesses using global code motion
MSP '04 Proceedings of the 2004 workshop on Memory system performance
Using a lookahead window in a compaction-based parallelizing compiler
ACM SIGMICRO Newsletter
Hi-index | 0.00 |
Combining is a local compiler optimization technique that can enhance the performance of global compaction techniques for VLIW machines. Given two adjacent operations of a certain class that are flow (read-after-write) dependent and that cannot be placed in the same micro-instruction, the combining technique can transform the operations so that the modified operations have no dependence. The transformed operations can be executed in the same micro-instruction, thus allowing the total execution time of the program to be reduced. In this paper, combining a pair of flow-dependent operations into a wide instruction word is suggested as an important compilation technique for VLIW architectures. Combining is particularly effective with software pipelining and loop unrolling since combinable operations can come together with a higher probability when these compilation techniques are used. We implemented combining in our parallelizing compiler for the wide instruction word architecture, which is now being built at the IBM T. J. Watson Research Center. It is shown that ten percent speedup is obtained on the Stanford integer benchmarks and other sequential-matured C programs, in comparison to compaction techniques that do not use combining. For a class of inner loops, combining can remove the inter-iteration dependencies completely and can improve performance in the same ratio as the loop is unrolled.