Bulldog: a compiler for VLSI architectures
Bulldog: a compiler for VLSI architectures
Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
The program dependence graph and its use in optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Limits on multiple instruction issue
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
“Combining” as a compilation technique for VLIW architectures
MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
Region Scheduling: An Approach for Detecting and Redistributing Parallelism
IEEE Transactions on Software Engineering
Instruction scheduling beyond basic blocks
IBM Journal of Research and Development
Selected papers of the second workshop on Languages and compilers for parallel computing
Circular scheduling: a new technique to perform software pipelining
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Global instruction scheduling for superscalar machines
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
IMPACT: an architectural framework for multiple-instruction-issue processors
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Avoiding unconditional jumps by code replication
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Optimally profiling and tracing programs
POPL '92 Proceedings of the 19th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An architectural framework for migration from CISC to higher performance platforms
ICS '92 Proceedings of the 6th international conference on Supercomputing
An efficient resource-constrained global scheduling technique for superscalar and VLIW processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Proving safety of speculative load instructions at compile-time
ESOP'92 Symposium proceedings on 4th European symposium on programming
POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A compilation technique for software pipelining of loops with conditional jumps
MICRO 20 Proceedings of the 20th annual workshop on Microprogramming
A global resource-constrained parallelization technique
ICS '89 Proceedings of the 3rd international conference on Supercomputing
The Design and Analysis of Computer Algorithms
The Design and Analysis of Computer Algorithms
Making Compaction-Based Parallelization Affordable
IEEE Transactions on Parallel and Distributed Systems
A reduced multipipeline machine description that preserves scheduling constraints
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Source-level debugging of scalar optimized code
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Scalable instruction-level parallelism through tree-instructions
ICS '97 Proceedings of the 11th international conference on Supercomputing
Resource-sensitive profile-directed data flow analysis for code optimization
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Parallelizing nonnumerical code with selective scheduling and software pipelining
ACM Transactions on Programming Languages and Systems (TOPLAS)
Binary translation and architecture convergence issues for IBM system/390
Proceedings of the 14th international conference on Supercomputing
A Prolog Tailoring Technique on an Epilog Tailored Procedure
PSI '02 Revised Papers from the 4th International Andrei Ershov Memorial Conference on Perspectives of System Informatics: Akademgorodok, Novosibirsk, Russia
Precise Exception Semantics in Dynamic Compilation
CC '02 Proceedings of the 11th International Conference on Compiler Construction
A compiler approach for reducing data cache energy
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Instruction Scheduling for Low Power
Journal of VLSI Signal Processing Systems
Efficient instruction scheduling for a pipelined architecture
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Reducing data cache leakage energy using a compiler-based approach
ACM Transactions on Embedded Computing Systems (TECS)
How many threads to spawn during program multithreading?
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Hi-index | 0.00 |
We describe techniques for converting the intermediate code representation of a given program, as generated by a modern compiler, to another representation which produces the same run-time results, but can run faster on a superscalar machine. The algorithms, based on novel parallelization techniques for Very Long Instruction Word (VLIW) architectures, find and place together independently executable operations that may be far apart in the original code. i.e., they may be separated by many conditional branches or belong to different iterations of a loop. As a result, the functional units in the superscalar are presented with more work that can proceed in parallel, thus achieving higher performance than the approach of using hardware instruction dispatch techniques alone.While general scheduling techniques improve performance by removing idle pipeline cycles, to further improve performance on a superscalar with only a few functional units requires a reduction in the pathlength. We have designed a set of new algorithms for reducing pathlength and removing stalls due to branches, namely speculative load-store motion out of loops, unspeculation, limited combining, basic block expansion, and prolog tailoring. These algorithms were implemented in a prototype version of the IBM RS/6000 xlc compiler and have shown significant improvement in SPEC integer benchmarks on the IBM POWER machines.Also, we describe a new technique to obtain profiling information with low overhead, and some applications of profiling directed feedback, including scheduling heuristics, code reordering and branch reversal.