Evaluation of scheduling techniques on a SPARC-based VLIW testbed

Authors:
Seongbae Park;SangMin Shim;Soo-Mook Moon
Affiliations:
School of Electrical Engineering, Seoul National University, Seoul 151-742, Korea;School of Electrical Engineering, Seoul National University, Seoul 151-742, Korea;School of Electrical Engineering, Seoul National University, Seoul 151-742, Korea
Venue:
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Year:
1997

Citing 13
Cited 10

Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Instruction-level parallel processing: history, overview, and perspective

The Journal of Supercomputing - Special issue on instruction-level parallelism
Modulo scheduling with multiple initiation intervals

Proceedings of the 28th annual international symposium on Microarchitecture
Modulo scheduling of loops in control-intensive non-numeric programs

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
DAISY: dynamic compilation for 100% architectural compatibility

Proceedings of the 24th annual international symposium on Computer architecture
Exploiting instruction level parallelism in the presence of conditional branches

Exploiting instruction level parallelism in the presence of conditional branches
Parallelizing nonnumerical code with selective scheduling and software pipelining

ACM Transactions on Programming Languages and Systems (TOPLAS)
Boosting beyond static scheduling in a superscalar processor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
An Architectural Framework for Supporting Heterogeneous Instruction-Set Architectures

Computer
Making Compaction-Based Parallelization Affordable

IEEE Transactions on Parallel and Distributed Systems
Generalized Multiway Branch Unit for VLIW Microprocessors

IEEE Transactions on Parallel and Distributed Systems

Parallelizing nonnumerical code with selective scheduling and software pipelining

ACM Transactions on Programming Languages and Systems (TOPLAS)
Split-path enhanced pipeline scheduling for loops with control flows

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Unroll-based register coalescing

Proceedings of the 14th international conference on Supercomputing
Power-aware modulo scheduling for high-performance VLIW processors

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Optimal software pipelining of loops with control flows

ICS '02 Proceedings of the 16th international conference on Supercomputing
Unroll-Based Copy Elimination for Enhanced Pipeline Scheduling

IEEE Transactions on Computers
Unroll-Based Copy Elimination for Enhanced Pipeline Scheduling

LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Comparing Tail Duplication with Compensation Code in Single Path Global Instruction Scheduling

CC '01 Proceedings of the 10th International Conference on Compiler Construction
Time optimal software pipelining of loops with control flows

International Journal of Parallel Programming
Rotating register allocation with multiple rotating branches

Proceedings of the 22nd annual international conference on Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The performance of Very Long Instruction Word (VLIW) microprocessors depends on the close cooperation between the compiler and the architecture. This paper evaluates a set of important compilation techniques and related architectural features for VLIW machines. The evaluation is performed on a SPARC-based VLIW testbed where gcc-generated optimized SPARC code is scheduled into high-performance VLIW code. As a base scheduling compiler, we experiment with three core scheduling techniques including enhanced pipeline scheduling, all-path speculation, and renaming. We analyze the characteristics of the useful and useless ALUs in each cycle to see how many of those ALUs execute non-speculative operations, speculative operations, and copies, respectively. Then, we evaluate the following compilation techniques: software pipelining, loop unrolling, non-greedy enhanced pipeline scheduling, profile-based all-path speculation, trace-based speculation, renaming, restricted speculative loads, and memory disambiguation. Since we experiment on a uniform testbed based on a detailed analysis of ALUs, our evaluation provides an useful insight on the performance impact of these techniques.