Profile-driven instruction level parallel scheduling with application to super blocks

Authors:
C. Chekuri;R. Johnson;R. Motwani;B. Natarajan;B. R. Rau;M. Schlansker
Affiliations:
Dept. of Comp. Sci., Stanford Univ., Stanford, CA;Hewlett Packard Labs, 1501 Page Mill Rd, Palo Alto, CA;Dept. of Comp. Sci., Stanford Univ., Stanford, CA;Hewlett Packard Labs, 1501 Page Mill Rd, Palo Alto, CA;Hewlett Packard Labs, 1501 Page Mill Rd, Palo Alto, CA;Hewlett Packard Labs, 1501 Page Mill Rd, Palo Alto, CA
Venue:
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Year:
1996

Citing 10
Cited 20

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Scheduling chain-structured tasks to minimize makespan and mean flow time

Information and Computation
Ordering problems approximated: single-processor scheduling and interval graph completion

Proceedings of the 18th international colloquium on Automata, languages and programming
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Compiling for the Cydra 5

The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Enhancing instruction level parallelism through compiler-controlled speculation

Enhancing instruction level parallelism through compiler-controlled speculation
Exploiting instruction level parallelism in the presence of conditional branches

Exploiting instruction level parallelism in the presence of conditional branches
Scheduling to minimize average completion time: off-line and on-line algorithms

Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Trace Scheduling: A Technique for Global Microcode Compaction

IEEE Transactions on Computers

Speculative hedge: regulating compile-time speculation against profile variations

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Approximation techniques for average completion time scheduling

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Flow and stretch metrics for scheduling continuous job streams

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Minimizing weighted completion time on a single machine

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Balance scheduling: weighting branch tradeoffs in superblocks

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Scheduling Superblocks with Bound-Based Branch Trade-Offs

IEEE Transactions on Computers - Special issue on the parallel architecture and compilation techniques conference
Continuous program optimization: A case study

ACM Transactions on Programming Languages and Systems (TOPLAS)
Inducing heuristics to decide whether to schedule

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Optimal Superblock Scheduling Using Enumeration

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Interval Analysis on Directed Acyclic Graphs for Global Optimization

Journal of Global Optimization
Compiling for EDGE Architectures

Proceedings of the International Symposium on Code Generation and Optimization
Data-Dependency Graph Transformations for Superblock Scheduling

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Models and Algorithms for Stochastic Online Scheduling

Mathematics of Operations Research
High-level interconnect model for the quantum logic array architecture

ACM Journal on Emerging Technologies in Computing Systems (JETC)
Instruction scheduling using evolutionary programming

ACC'08 Proceedings of the WSEAS International Conference on Applied Computing Conference
An Application of Constraint Programming to Superblock Instruction Scheduling

CP '08 Proceedings of the 14th international conference on Principles and Practice of Constraint Programming
Optimal trace scheduling using enumeration

ACM Transactions on Architecture and Code Optimization (TACO)
Combined profiling: practical collection of feedback information for code optimization

Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
Stochastic online scheduling on parallel machines

WAOA'04 Proceedings of the Second international conference on Approximation and Online Algorithms
List scheduling in order of α-points on a single machine

Efficient Approximation and Online Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

Code scheduling to exploit instruction level parallelism (ILP) is a critical problem in compiler optimization research in light of the increased use of long-instruction-word machines. Unfortunately optimum scheduling is computationally intractable, and one must resort to carefully crafted heuristics in practice. If the scope of application of a scheduling heuristic is limited to basic blocks, considerable performance loss may be incurred at block boundaries. To overcome this obstacle, basic blocks can be coalesced across branches to form larger regions such as super blocks. In the literature, these regions are typically scheduled using algorithms that are either oblivious to profile information (under the assumption that the process of forming the region has fully utilized the profile information), or use the profile information as an addendum to classical scheduling techniques. We believe that even for the simple case of linear code regions such as super blocks, additional performance improvement can be gained by utilizing the profile information in scheduling as well. We propose a general paradigm for converting any profile-insensitive list scheduler to a profile-sensitive scheduler. Our technique is developed via a theoretical analysis of a simplified abstract model of the general problem of profile-driven scheduling over any acyclic code region, yielding a scoring measure for ranking branch instructions.