Wavefront scheduling: path based data representation and scheduling of subgraphs

Authors:
Jay Bharadwaj;Kishore Menezes;Chris McKinsey
Affiliations:
Intel Corporation, 3600 Juliette Lane, Santa Clara, CA;Intel Corporation, 3600 Juliette Lane, Santa Clara, CA;Star*Core Technology Center, 2100 Riveredge Parkway, Suite 600, Atlanta, GA
Venue:
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Year:
1999

Citing 9
Cited 11

The program dependence graph and its use in optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Global instruction scheduling for superscalar machines

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Code duplication: an assist for global instruction scheduling

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
An efficient resource-constrained global scheduling technique for superscalar and VLIW processors

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Proving safety of speculative load instructions at compile-time

ESOP'92 Symposium proceedings on 4th European symposium on programming
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Treegion Scheduling for Wide Issue Processors

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Trace Scheduling: A Technique for Global Microcode Compaction

IEEE Transactions on Computers

On the importance of points-to analysis and other memory disambiguation methods for C programs

Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
The impact of if-conversion and branch prediction on program execution on the Intel® Itanium™ processor

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Backtracking-Based Instruction Scheduling to Fill Branch Delay Slots

International Journal of Parallel Programming
The Intel IA-64 Compiler Code Generator

IEEE Micro
Compiler managed micro-cache bypassing for high performance EPIC processors

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Field-testing IMPACT EPIC research results in Itanium 2

Proceedings of the 31st annual international symposium on Computer architecture
A Compiler Framework for Recovery Code Generation in General Speculative Optimizations

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Recovery code generation for general speculative optimizations

ACM Transactions on Architecture and Code Optimization (TACO)
Reducing code size in VLIW instruction scheduling

Journal of Embedded Computing - Low-power Embedded Systems
Tree traversal scheduling: a global instruction scheduling technique for VLIW/EPIC processors

LCPC'01 Proceedings of the 14th international conference on Languages and compilers for parallel computing
SMARQ: Software-Managed Alias Register Queue for Dynamic Optimizations

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

The IA-64 architecture is rich with features that enable aggressive exploitation of instruction-level parallelism. Features such as speculation, predication, multiway branches and others provide compilers with new opportunities for the extraction of parallelism in programs. Code scheduling is a central component in any compiler for the IA-64 architecture. This paper describes the implementation of the global code scheduler (GCS) in Intel's reference compiler for the IA-64 architecture. GCS schedules code over acyclic regions of control flow. There is a tight coupling between the formation and scheduling of regions. GCS employs a new path based data dependence representation that combines control flow and data dependence information to make data analysis easy and accurate. This paper provides details of this representation. The scheduler uses a novel instruction scheduling technique called Wavefront scheduling. The concepts of wavefront scheduling and deferred compensation are explained to demonstrate the efficient generation of compensation code while scheduling. This paper also presents P-ready code motion, an opportunistic instruction level tail duplication which aims to strike a balance between code expansion and performance potential. Performance results show greater than 30% improvement in speedup for wavefront scheduling over basic block scheduling on the Merced microarchitecture.