Limits of instruction-level parallelism
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
IMPACT: an architectural framework for multiple-instruction-issue processors
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Instruction-level parallel processing: history, overview, and perspective
The Journal of Supercomputing - Special issue on instruction-level parallelism
Modulo scheduling with multiple initiation intervals
Proceedings of the 28th annual international symposium on Microarchitecture
Modulo scheduling of loops in control-intensive non-numeric programs
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
DAISY: dynamic compilation for 100% architectural compatibility
Proceedings of the 24th annual international symposium on Computer architecture
Exploiting instruction level parallelism in the presence of conditional branches
Exploiting instruction level parallelism in the presence of conditional branches
Parallelizing nonnumerical code with selective scheduling and software pipelining
ACM Transactions on Programming Languages and Systems (TOPLAS)
Boosting beyond static scheduling in a superscalar processor
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Making Compaction-Based Parallelization Affordable
IEEE Transactions on Parallel and Distributed Systems
Generalized Multiway Branch Unit for VLIW Microprocessors
IEEE Transactions on Parallel and Distributed Systems
Parallelizing nonnumerical code with selective scheduling and software pipelining
ACM Transactions on Programming Languages and Systems (TOPLAS)
Split-path enhanced pipeline scheduling for loops with control flows
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Unroll-based register coalescing
Proceedings of the 14th international conference on Supercomputing
Power-aware modulo scheduling for high-performance VLIW processors
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Optimal software pipelining of loops with control flows
ICS '02 Proceedings of the 16th international conference on Supercomputing
Unroll-Based Copy Elimination for Enhanced Pipeline Scheduling
IEEE Transactions on Computers
Unroll-Based Copy Elimination for Enhanced Pipeline Scheduling
LCPC '99 Proceedings of the 12th International Workshop on Languages and Compilers for Parallel Computing
Comparing Tail Duplication with Compensation Code in Single Path Global Instruction Scheduling
CC '01 Proceedings of the 10th International Conference on Compiler Construction
Time optimal software pipelining of loops with control flows
International Journal of Parallel Programming
Rotating register allocation with multiple rotating branches
Proceedings of the 22nd annual international conference on Supercomputing
Hi-index | 0.00 |
The performance of Very Long Instruction Word (VLIW) microprocessors depends on the close cooperation between the compiler and the architecture. This paper evaluates a set of important compilation techniques and related architectural features for VLIW machines. The evaluation is performed on a SPARC-based VLIW testbed where gcc-generated optimized SPARC code is scheduled into high-performance VLIW code. As a base scheduling compiler, we experiment with three core scheduling techniques including enhanced pipeline scheduling, all-path speculation, and renaming. We analyze the characteristics of the useful and useless ALUs in each cycle to see how many of those ALUs execute non-speculative operations, speculative operations, and copies, respectively. Then, we evaluate the following compilation techniques: software pipelining, loop unrolling, non-greedy enhanced pipeline scheduling, profile-based all-path speculation, trace-based speculation, renaming, restricted speculative loads, and memory disambiguation. Since we experiment on a uniform testbed based on a detailed analysis of ALUs, our evaluation provides an useful insight on the performance impact of these techniques.