Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
Very Long Instruction Word architectures and the ELI-512
ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Code scheduling and register allocation in large basic blocks
ICS '88 Proceedings of the 2nd international conference on Supercomputing
On the Minimization of Loads/Stores in Local Register Allocation
IEEE Transactions on Software Engineering
Instruction scheduling beyond basic blocks
IBM Journal of Research and Development
A Performance Comparison of the IBM RS/6000 and the Astronautics ZS-1
Computer - Special issue on experimental research in computer architecture
The floating point performance of a superscalar SPARC processor
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Integrating register allocation and instruction scheduling for RISCs
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Circular scheduling: a new technique to perform software pipelining
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
IMPACT: an architectural framework for multiple-instruction-issue processors
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Comparing static and dynamic code scheduling for multiple-instruction-issue processors
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Performance evaluation for various configuration of superscalar processors
ACM SIGARCH Computer Architecture News
Effects of memory latencies on non-blocking processor/cache architectures
ICS '93 Proceedings of the 7th international conference on Supercomputing
A schedular-sensitive global register allocator
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Register allocation sensitive region scheduling
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Improving instruction-level parallelism by loop unrolling and dynamic memory disambiguation
Proceedings of the 28th annual international symposium on Microarchitecture
Proceedings of the 28th annual international symposium on Microarchitecture
Techniques for extracting instruction level parallelism on MIMD architectures
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Resource-bounded partial evaluation
PEPM '97 Proceedings of the 1997 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation
Can program profiling support value prediction?
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The effect of instruction fetch bandwidth on value prediction
Proceedings of the 25th annual international symposium on Computer architecture
Experiences with Cooperating Register Allocation and Instruction Scheduling
International Journal of Parallel Programming
IMPACT: an architectural framework for multiple-instruction-issue processors
25 years of the international symposia on Computer architecture (selected papers)
Using value prediction to increase the power of speculative execution hardware
ACM Transactions on Computer Systems (TOCS)
Approximation techniques for average completion time scheduling
SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
Optimized unrolling of nested loops
Proceedings of the 14th international conference on Supercomputing
Optimized Unrolling of Nested Loops
International Journal of Parallel Programming
Three Architectural Models for Compiler-Controlled Speculative Execution
IEEE Transactions on Computers
Scheduling DAG's for Asynchronous Multiprocessor Execution
IEEE Transactions on Parallel and Distributed Systems
Software pipelining: an effective scheduling technique for VLIW machines
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Automatic analysis for managing and optimizing performance-code quality
Proceedings of the 2008 workshop on Static analysis
Hi-index | 0.00 |
This paper studies two compilation techniques for enhancing scalar performance in high-speed scientific processors: software pipelining and loop unrolling. We study the impact of the architecture (size of the register file) and of the hardware (size of instruction buffer) on the efficiency of loop unrolling. We also develop a methodology for classifying software pipelining techniques. For loop unrolling, a straightforward scheduling algorithm is shown to produce near-optimal results when not inhibited by recurrences or memory hazards. Software pipelining requires less hardware but also achieves less speedup. Finally, we show that the performance produced with a modified CRAY-1S scalar architecture and a code scheduler utilizing loop unrolling is comparable to the performance achieved by the CRAY-1S with a vector unit and the CFT vectorizing compiler.