A study of scalar compilation techniques for pipelined supercomputers

Authors:
Shlomo Weiss;James E. Smith
Affiliations:
Microelectronics and Computer Technology Corp.,Austin, TX;Univ. of Wisconsin, Madison
Venue:
ASPLOS II Proceedings of the second international conference on Architectual support for programming languages and operating systems
Year:
1987

Citing 2
Cited 31

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Very Long Instruction Word architectures and the ELI-512

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture

Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Code scheduling and register allocation in large basic blocks

ICS '88 Proceedings of the 2nd international conference on Supercomputing
On the Minimization of Loads/Stores in Local Register Allocation

IEEE Transactions on Software Engineering
Instruction scheduling beyond basic blocks

IBM Journal of Research and Development
A Performance Comparison of the IBM RS/6000 and the Astronautics ZS-1

Computer - Special issue on experimental research in computer architecture
The floating point performance of a superscalar SPARC processor

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Integrating register allocation and instruction scheduling for RISCs

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Circular scheduling: a new technique to perform software pipelining

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparing static and dynamic code scheduling for multiple-instruction-issue processors

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Performance evaluation for various configuration of superscalar processors

ACM SIGARCH Computer Architecture News
Effects of memory latencies on non-blocking processor/cache architectures

ICS '93 Proceedings of the 7th international conference on Supercomputing
A schedular-sensitive global register allocator

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Register allocation sensitive region scheduling

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Improving instruction-level parallelism by loop unrolling and dynamic memory disambiguation

Proceedings of the 28th annual international symposium on Microarchitecture
An experimental study of several cooperative register allocation and instruction scheduling strategies

Proceedings of the 28th annual international symposium on Microarchitecture
Techniques for extracting instruction level parallelism on MIMD architectures

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Resource-bounded partial evaluation

PEPM '97 Proceedings of the 1997 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
Can program profiling support value prediction?

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The effect of instruction fetch bandwidth on value prediction

Proceedings of the 25th annual international symposium on Computer architecture
Experiences with Cooperating Register Allocation and Instruction Scheduling

International Journal of Parallel Programming
IMPACT: an architectural framework for multiple-instruction-issue processors

25 years of the international symposia on Computer architecture (selected papers)
Using value prediction to increase the power of speculative execution hardware

ACM Transactions on Computer Systems (TOCS)
Approximation techniques for average completion time scheduling

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Optimized unrolling of nested loops

Proceedings of the 14th international conference on Supercomputing
Optimized Unrolling of Nested Loops

International Journal of Parallel Programming
Three Architectural Models for Compiler-Controlled Speculative Execution

IEEE Transactions on Computers
Scheduling DAG's for Asynchronous Multiprocessor Execution

IEEE Transactions on Parallel and Distributed Systems
Software pipelining: an effective scheduling technique for VLIW machines

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Automatic analysis for managing and optimizing performance-code quality

Proceedings of the 2008 workshop on Static analysis
Designing programming languages for the analyzability of pointer data structures

Computer Languages

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper studies two compilation techniques for enhancing scalar performance in high-speed scientific processors: software pipelining and loop unrolling. We study the impact of the architecture (size of the register file) and of the hardware (size of instruction buffer) on the efficiency of loop unrolling. We also develop a methodology for classifying software pipelining techniques. For loop unrolling, a straightforward scheduling algorithm is shown to produce near-optimal results when not inhibited by recurrences or memory hazards. Software pipelining requires less hardware but also achieves less speedup. Finally, we show that the performance produced with a modified CRAY-1S scalar architecture and a code scheduler utilizing loop unrolling is comparable to the performance achieved by the CRAY-1S with a vector unit and the CFT vectorizing compiler.