Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures

Authors:
Ramadass Nagarajan;Sundeep K. Kushwaha;Doug Burger;Kathryn S. McKinley;Calvin Lin;Stephen W. Keckler
Affiliations:
The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin;The University of Texas at Austin
Venue:
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Year:
2004

Citing 30
Cited 17

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Balanced scheduling: instruction scheduling when memory latency is uncertain

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Dynamically scheduled VLIW processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Complexity-effective superscalar processors

Proceedings of the 24th annual international symposium on Computer architecture
The multicluster architecture: reducing cycle time through partitioning

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Integrated predicated and speculative execution in the IMPACT EPIC architecture

Proceedings of the 25th annual international symposium on Computer architecture
Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Control CPR: a branch height reduction optimization for EPIC architectures

Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Parallel processing: a smart compiler and a dumb machine

SIGPLAN '84 Proceedings of the 1984 SIGPLAN symposium on Compiler construction
Optimal integrated code generation for clustered VLIW architectures

Proceedings of the joint conference on Languages, compilers and tools for embedded systems: software and compilers for embedded systems
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Increasing processor performance by implementing deeper pipelines

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Baring It All to Software: Raw Machines

Computer
EPIC: Explicitly Parallel Instruction Computing

Computer
High-Speed Electrical Signaling: Overview and Limitations

IEEE Micro
Introducing the IA-64 Architecture

IEEE Micro
Optimizing Loop Performance for Clustered VLIW Architectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Convergent scheduling

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Very Long Instruction Word architectures and the ELI-512

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Treegion Scheduling for Wide Issue Processors

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture

Proceedings of the 30th annual international symposium on Computer architecture
CARS: A New Code Generation Framework for Clustered ILP Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
WaveScalar

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Scaling to the End of Silicon with EDGE Architectures

Computer

Compiler-directed channel allocation for saving power in on-chip networks

Conference record of the 33rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Compiling for EDGE Architectures

Proceedings of the International Symposium on Code Generation and Optimization
Reducing NoC energy consumption through compiler-directed channel voltage scaling

Proceedings of the 2006 ACM SIGPLAN conference on Programming language design and implementation
Reducing control overhead in dataflow architectures

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
A spatial path scheduling algorithm for EDGE architectures

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Instruction scheduling for a tiled dataflow architecture

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Merging Head and Tail Duplication for Convergent Hyperblock Formation

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Profile-driven energy reduction in network-on-chips

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Data locality enhancement for CMPs

Proceedings of the 2007 IEEE/ACM international conference on Computer-aided design
Compiling for vector-thread architectures

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Software-directed combined cpu/link voltage scaling fornoc-based cmps

SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Application mapping for chip multiprocessors

Proceedings of the 45th annual Design Automation Conference
Compiler directed network-on-chip reliability enhancement for chip multiprocessors

Proceedings of the ACM SIGPLAN/SIGBED 2010 conference on Languages, compilers, and tools for embedded systems
A scheduling approach for distributed resource architectures with scarce communication resources

International Journal of High Performance Systems Architecture
Discerning the dominant out-of-order performance advantage: is it speculation or dynamism?

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
A general constraint-centric scheduling framework for spatial architectures

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Constraint centric scheduling guide

ACM SIGARCH Computer Architecture News

Quantified Score

Hi-index	0.00

Visualization

Abstract

Technology trends present new challenges for processor architectures and their instruction schedulers. Growing transistor density will increase the number of execution units on a single chip, and decreasing wire transmission speeds will cause long and variable on-chip latencies. These trends will severely limit the two dominant conventional architectures: dynamic issue superscalars, and static placement and issue VLIWs. We present a new execution model in which the hardware and static scheduler instead work cooperatively, called Static Placement Dynamic Issue (SPDI). This paper focuses on the static instruction scheduler for SPDI. We identify and explore three issues SPDI schedulers must consider-locality, contention, and depth of speculation. We evaluate a range of SPDI scheduling algorithms executing on an Explicit Data Graph Execution (EDGE) architecture. We find that a surprisingly simple one achieves an average of 5.6 instructions-per-cycle (IPC) for SPEC2000 64-wide issue machine, and is within 80% of the performance without on-chip latencies. These results suggest that the compiler is effective at balancing on-chip latency and parallelism, and that the division of responsibilities between the compiler and the architecture is well suited to future systems.