Parallelizing nonnumerical code with selective scheduling and software pipelining

Authors:
Soo-Mook Moon;Kemal Ebcioğlu
Affiliations:
Seoul National Univ., Seoul, Korea;IBM T. J. Watson Research Center, Yorktown Heights, NY
Venue:
ACM Transactions on Programming Languages and Systems (TOPLAS)
Year:
1997

Citing 36
Cited 25

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
The program dependence graph and its use in optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
A VLIW architecture for a trace Scheduling Compiler

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
The Cydra 5 Departmental Supercomputer: Design Philosophies, Decisions, and Trade-Offs

Computer
Available instruction-level parallelism for superscalar and superpipelined machines

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Limits on multiple instruction issue

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
“Combining” as a compilation technique for VLIW architectures

MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
Region Scheduling: An Approach for Detecting and Redistributing Parallelism

IEEE Transactions on Software Engineering
A new compilation technique for parallelizing loops with unpredictable branches on a VLIW architecture

Selected papers of the second workshop on Languages and compilers for parallel computing
Circular scheduling: a new technique to perform software pipelining

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Global instruction scheduling for superscalar machines

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Efficiently computing static single assignment form and the control dependence graph

ACM Transactions on Programming Languages and Systems (TOPLAS)
Predicting conditional branch directions from previous runs of a program

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Efficient superscalar performance through boosting

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Alpha AXP architecture

Communications of the ACM
Instruction-level parallel processing: history, overview, and perspective

The Journal of Supercomputing - Special issue on instruction-level parallelism
Compiling for the Cydra 5

The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
VLIW compilation techniques in a superscalar environment

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Avoidance and suppression of compensation code in a trace scheduling compiler

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compile-time parallelization of non-numerical code: VLIW superscalar

Compile-time parallelization of non-numerical code: VLIW superscalar
Reduced instruction set computers

Communications of the ACM - Special section on computer architecture
Increasing cache bandwidth using multi-port caches for exploiting ILP in non-numerical code

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Performance issues in correlated branch prediction schemes

Proceedings of the 28th annual international symposium on Microarchitecture
A study on the number of memory ports in multiple instruction issue machines

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Performance analysis of tree VLIW architecture for exploiting branch ILP in non-numerical code

ICS '97 Proceedings of the 11th international conference on Supercomputing
Evaluation of scheduling techniques on a SPARC-based VLIW testbed

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
An Architectural Framework for Supporting Heterogeneous Instruction-Set Architectures

Computer
Performance and the i860 Microprocessor

IEEE Micro
Making Compaction-Based Parallelization Affordable

IEEE Transactions on Parallel and Distributed Systems
Generalized Multiway Branch Unit for VLIW Microprocessors

IEEE Transactions on Parallel and Distributed Systems
A Development Environment for Horizontal Microcode

IEEE Transactions on Software Engineering
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
Bulldog: a compiler for vliw architectures (parallel computing, reduced-instruction-set, trace scheduling, scientific)

Bulldog: a compiler for vliw architectures (parallel computing, reduced-instruction-set, trace scheduling, scientific)

Performance analysis of tree VLIW architecture for exploiting branch ILP in non-numerical code

ICS '97 Proceedings of the 11th international conference on Supercomputing
Evaluation of scheduling techniques on a SPARC-based VLIW testbed

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Split-path enhanced pipeline scheduling for loops with control flows

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Unroll-based register coalescing

Proceedings of the 14th international conference on Supercomputing
Optimal software pipelining of loops with control flows

ICS '02 Proceedings of the 16th international conference on Supercomputing
Path Analysis and Renaming for Predicated Instruction Scheduling

International Journal of Parallel Programming
Unroll-Based Copy Elimination for Enhanced Pipeline Scheduling

IEEE Transactions on Computers
Unroll-Based Copy Elimination for Enhanced Pipeline Scheduling

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Global Software Pipelining with Iteration Preselection

CC '00 Proceedings of the 9th International Conference on Compiler Construction
A First Step Towards Time Optimal Software Pipelining of Loops with Control Flows

CC '01 Proceedings of the 10th International Conference on Compiler Construction
Split-Path Enhanced Pipeline Scheduling

IEEE Transactions on Parallel and Distributed Systems
A compiler approach for reducing data cache energy

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Single-Dimension Software Pipelining for Multi-Dimensional Loops

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Time optimal software pipelining of loops with control flows

International Journal of Parallel Programming
Optimistic register coalescing

ACM Transactions on Programming Languages and Systems (TOPLAS)
Reducing data cache leakage energy using a compiler-based approach

ACM Transactions on Embedded Computing Systems (TECS)
Single-dimension software pipelining for multidimensional loops

ACM Transactions on Architecture and Code Optimization (TACO)
Construction of speculative optimization algorithms

Programming and Computing Software
Rotating register allocation with multiple rotating branches

Proceedings of the 22nd annual international conference on Supercomputing
Register allocation for software pipelined multidimensional loops

ACM Transactions on Programming Languages and Systems (TOPLAS)
Analysis of imperative XML programs

Information Systems
Analysis of imperative XML programs

DBPL'07 Proceedings of the 11th international conference on Database programming languages
Register pressure in software-pipelined loop nests: fast computation and impact on architecture design

LCPC'05 Proceedings of the 18th international conference on Languages and Compilers for Parallel Computing
Just-In-Time Software Pipelining

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Instruction-level parallelism (ILP) in nonnumerical code is regarded as scarce and hard to exploit due to its irregularity. In this article, we introduce a new code-scheduling technique for irregular ILP called “selective scheduling” which can be used as a component for superscalar and VLIW compilers. Selective scheduling can compute a wide set of independent operations across all execution paths based on renaming and forward-substitution and can compute available operations across loop iterations if combined with software pipelining. This scheduling approach has better heuristics for determining the usefulness of moving one operation versus moving another and can successfully find useful code motions without resorting to branch profiling. The compile-time overhead of selective scheduling is low due to its incremental computation technique and its controlled code duplication. We parallelized the SPEC integer benchmarks and five AIX utilities without using branch probabilities. The experiments indicate that a fivefold speedup is achievable on realistic resources with a reasonable overhead in compilation time and code expansion and that a solid speedup increase is also obtainable on machines with fewer resources. These results improve previously known characteristics of irregular ILP.