Improving balanced scheduling with compiler optimizations that increase instruction-level parallelism

Authors:
Jack L. Lo;Susan J. Eggers
Affiliations:
Department of Computer Science and Engineering, University of Washington;Department of Computer Science and Engineering, University of Washington
Venue:
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Year:
1995

Citing 12
Cited 5

Efficient instruction scheduling for a pipelined architecture

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
New CPU benchmark suites from SPEC

COMPCON '92 Proceedings of the thirty-seventh international conference on COMPCON
Design and evaluation of a compiler algorithm for prefetching

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Balanced scheduling: instruction scheduling when memory latency is uncertain

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
The multiflow trace scheduling compiler

The Journal of Supercomputing - Special issue on instruction-level parallelism
Complexity/performance tradeoffs with non-blocking loads

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Postpass Code Optimization of Pipeline Constraints

ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel processing: a smart compiler and a dumb machine

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Organization of the Motorola 88110 Superscalar RISC Microprocessor

IEEE Micro
Performance Features of the PA7100 Microprocessor

IEEE Micro
The Alpha AXP Architecture and 21064 Processor

IEEE Micro
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture

Code transformations to improve memory parallelism

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Changing Interaction of Compiler and Architecture

Computer
Load Scheduling with Profile Information

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Balanced scheduling: instruction scheduling when memory latency is uncertain

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Instruction scheduling for a tiled dataflow architecture

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Traditional list schedulers order instructions based on an optimistic estimate of the load latency imposed by the hardware and therefore cannot respond to variations in memory latency caused by cache hits and misses on non-blocking architectures. In contrast, balanced scheduling schedules instructions based on an estimate of the amount of instruction-level parallelism in the program. By scheduling independent instructions behind loads based on what the program can provide, rather than what the implementation stipulates in the best case (i.e., a cache hit), balanced scheduling can hide variations in memory latencies more effectively.Since its success depends on the amount of instruction-level parallelism in the code, balanced scheduling should perform even better when more parallelism is available. In this study, we combine balanced scheduling with three compiler optimizations that increase instruction-level parallelism: loop unrolling, trace scheduling and cache locality analysis. Using code generated for the DEC Alpha by the Multiflow compiler, we simulated a non-blocking processor architecture that closely models the Alpha 21164. Our results show that balanced scheduling benefits from all three optimizations, producing average speedups that range from 1.15 to 1.40, across the optimizations. More importantly, because of its ability to tolerate variations in load interlocks, it improves its advantage over traditional scheduling. Without the optimizations, balanced scheduled code is, on average, 1.05 times faster than that generated by a traditional scheduler; with them, its lead increases to 1.18.