Improving quasi-dynamic schedules through region slip

Authors:
Francesco Spadini;Brian Fahs;Sanjay Patel;Steven S. Lumetta
Affiliations:
University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign;University of Illinois at Urbana-Champaign
Venue:
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Year:
2003

Citing 21
Cited 6

Efficient instruction scheduling for a pipelined architecture

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Comparing static and dynamic code scheduling for multiple-instruction-issue processors

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Enhancing instruction scheduling with a block-structured ISA

International Journal of Parallel Programming
Exploiting instruction level parallelism in processors by caching scheduled groups

Proceedings of the 24th annual international symposium on Computer architecture
DAISY: dynamic compilation for 100% architectural compatibility

Proceedings of the 24th annual international symposium on Computer architecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A low-complexity issue logic

Proceedings of the 14th international conference on Supercomputing
Increasing the size of atomic instruction blocks using control flow assertions

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Dynamically scheduling VLIW instructions

Journal of Parallel and Distributed Computing
Reducing the complexity of the issue logic

ICS '01 Proceedings of the 15th international conference on Supercomputing
rePLay: A Hardware Framework for Dynamic Optimization

IEEE Transactions on Computers
Performance characterization of a hardware mechanism for dynamic optimization

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures

International Journal of Parallel Programming
PA-RISC to IA-64: Transparent Execution, No Recompilation

Computer
Weld: A Multithreading Technique Towards Latency-Tolerant VLIW Processors

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
Dynamic Optimization of Micro-Operations

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Superscalar Execution with Direct Data Forwarding

PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture

VHC: Quickly Building an Optimizer for Complex Embedded Architectures

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Low-power, low-complexity instruction issue using compiler assistance

Proceedings of the 19th annual international conference on Supercomputing
Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining

IEEE Transactions on Computers
Reusing cached schedules in an out-of-order processor with in-order issue logic

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Dynamic instruction scheduling in a trace-based multi-threaded architecture

International Journal of Parallel Programming
Discerning the dominant out-of-order performance advantage: is it speculation or dynamism?

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern processors perform dynamic scheduling to achieve better utilization of execution resources. A schedule created at run-time is often better than one created at compile-time as it can dynamically adapt to specific events encountered at execution-time. In this paper, we examine some fundamental impediments to effective static scheduling. More specifically, we examine the question of why schedules generated quasi-dynamically by a low-level run-time optimizer and executed on a statically scheduled machine perform worse than using a dynamically-scheduled approach. We observe that such schedules suffer because of region boundaries and a skewed distribution of parallelism towards the beginning of a region. To overcome these limitations, we investigate a new concept, region slip, in which the schedules of different statically-scheduled regions can be interleaved in the processor issue queue to reduce the region boundary effects that cause empty issue slots.