Compilers: principles, techniques, and tools
Compilers: principles, techniques, and tools
Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
The program dependence graph and its use in optimization
ACM Transactions on Programming Languages and Systems (TOPLAS)
A VLIW architecture for a trace Scheduling Compiler
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Available instruction-level parallelism for superscalar and superpipelined machines
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits on multiple instruction issue
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
“Combining” as a compilation technique for VLIW architectures
MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
Region Scheduling: An Approach for Detecting and Redistributing Parallelism
IEEE Transactions on Software Engineering
Selected papers of the second workshop on Languages and compilers for parallel computing
Circular scheduling: a new technique to perform software pipelining
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Global instruction scheduling for superscalar machines
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Efficiently computing static single assignment form and the control dependence graph
ACM Transactions on Programming Languages and Systems (TOPLAS)
Predicting conditional branch directions from previous runs of a program
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Efficient superscalar performance through boosting
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Communications of the ACM
Instruction-level parallel processing: history, overview, and perspective
The Journal of Supercomputing - Special issue on instruction-level parallelism
The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
VLIW compilation techniques in a superscalar environment
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Avoidance and suppression of compensation code in a trace scheduling compiler
ACM Transactions on Programming Languages and Systems (TOPLAS)
Compile-time parallelization of non-numerical code: VLIW superscalar
Compile-time parallelization of non-numerical code: VLIW superscalar
Reduced instruction set computers
Communications of the ACM - Special section on computer architecture
Increasing cache bandwidth using multi-port caches for exploiting ILP in non-numerical code
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Performance issues in correlated branch prediction schemes
Proceedings of the 28th annual international symposium on Microarchitecture
A study on the number of memory ports in multiple instruction issue machines
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Performance analysis of tree VLIW architecture for exploiting branch ILP in non-numerical code
ICS '97 Proceedings of the 11th international conference on Supercomputing
Evaluation of scheduling techniques on a SPARC-based VLIW testbed
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Performance and the i860 Microprocessor
IEEE Micro
Making Compaction-Based Parallelization Affordable
IEEE Transactions on Parallel and Distributed Systems
Generalized Multiway Branch Unit for VLIW Microprocessors
IEEE Transactions on Parallel and Distributed Systems
A Development Environment for Horizontal Microcode
IEEE Transactions on Software Engineering
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Bulldog: a compiler for vliw architectures (parallel computing, reduced-instruction-set, trace scheduling, scientific)
Performance analysis of tree VLIW architecture for exploiting branch ILP in non-numerical code
ICS '97 Proceedings of the 11th international conference on Supercomputing
Evaluation of scheduling techniques on a SPARC-based VLIW testbed
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Split-path enhanced pipeline scheduling for loops with control flows
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Unroll-based register coalescing
Proceedings of the 14th international conference on Supercomputing
Optimal software pipelining of loops with control flows
ICS '02 Proceedings of the 16th international conference on Supercomputing
Path Analysis and Renaming for Predicated Instruction Scheduling
International Journal of Parallel Programming
Unroll-Based Copy Elimination for Enhanced Pipeline Scheduling
IEEE Transactions on Computers
Unroll-Based Copy Elimination for Enhanced Pipeline Scheduling
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Global Software Pipelining with Iteration Preselection
CC '00 Proceedings of the 9th International Conference on Compiler Construction
A First Step Towards Time Optimal Software Pipelining of Loops with Control Flows
CC '01 Proceedings of the 10th International Conference on Compiler Construction
Split-Path Enhanced Pipeline Scheduling
IEEE Transactions on Parallel and Distributed Systems
A compiler approach for reducing data cache energy
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Single-Dimension Software Pipelining for Multi-Dimensional Loops
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Time optimal software pipelining of loops with control flows
International Journal of Parallel Programming
Optimistic register coalescing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Reducing data cache leakage energy using a compiler-based approach
ACM Transactions on Embedded Computing Systems (TECS)
Single-dimension software pipelining for multidimensional loops
ACM Transactions on Architecture and Code Optimization (TACO)
Construction of speculative optimization algorithms
Programming and Computing Software
Rotating register allocation with multiple rotating branches
Proceedings of the 22nd annual international conference on Supercomputing
Register allocation for software pipelined multidimensional loops
ACM Transactions on Programming Languages and Systems (TOPLAS)
Analysis of imperative XML programs
Information Systems
Analysis of imperative XML programs
DBPL'07 Proceedings of the 11th international conference on Database programming languages
LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Just-In-Time Software Pipelining
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Hi-index | 0.00 |
Instruction-level parallelism (ILP) in nonnumerical code is regarded as scarce and hard to exploit due to its irregularity. In this article, we introduce a new code-scheduling technique for irregular ILP called “selective scheduling” which can be used as a component for superscalar and VLIW compilers. Selective scheduling can compute a wide set of independent operations across all execution paths based on renaming and forward-substitution and can compute available operations across loop iterations if combined with software pipelining. This scheduling approach has better heuristics for determining the usefulness of moving one operation versus moving another and can successfully find useful code motions without resorting to branch profiling. The compile-time overhead of selective scheduling is low due to its incremental computation technique and its controlled code duplication. We parallelized the SPEC integer benchmarks and five AIX utilities without using branch probabilities. The experiments indicate that a fivefold speedup is achievable on realistic resources with a reasonable overhead in compilation time and code expansion and that a solid speedup increase is also obtainable on machines with fewer resources. These results improve previously known characteristics of irregular ILP.