Virtual Cluster Scheduling Through the Scheduling Graph

Authors:
Josep M. Codina;Jesus Sanchez;Antonio Gonzalez
Affiliations:
UPC, Barcelona, Spain;UPC, Barcelona, Spain;UPC, Barcelona, Spain
Venue:
Proceedings of the International Symposium on Code Generation and Optimization
Year:
2007

Citing 20
Cited 0

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Effective cluster assignment for modulo scheduling

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Lx: a technology platform for customizable VLIW embedded processing

Proceedings of the 27th annual international symposium on Computer architecture
Modulo scheduling with integrated register spilling for clustered VLIW architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Cluster assignment for high-performance embedded VLIW processors

ACM Transactions on Design Automation of Electronic Systems (TODAES)
The TigerSHARC DSP Architecture

IEEE Micro
A Unified Modulo Scheduling and Register Allocation Technique for Clustered Processors

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Convergent scheduling

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Region-based hierarchical operation partitioning for multicluster processors

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Distributed Modulo Scheduling

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
CARS: A New Code Generation Framework for Clustered ILP Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Cluster assignment of global values for clustered VLIW processors

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Instruction Replication for Clustered Microarchitectures

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Integrating a new cluster assignment and scheduling algorithm into an experimental retargetable code generation framework

HiPC'05 Proceedings of the 12th international conference on High Performance Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an instruction scheduling and cluster assignment approach for clustered processors. The proposed technique makes use of a novel representation named the scheduling graph which describes all possible schedules. A powerful deduction process is applied to this graph, reducing at each step the set of possible schedules. In contrast to traditional list scheduling techniques, the proposed scheme tries to establish relations among instructions rather than assigning each instruction to a particular cycle. The main advantage is that wrong or poor schedules can be anticipated and discarded earlier. In addition, cluster assignment of instructions is performed using another novel concept called virtual clusters, which define sets of instructions that must execute in the same cluster. These clusters are managed during the deduction process to identify incompatibilities among instructions. The mapping of virtual to physical clusters is postponed until the scheduling of the instructions has finalized. The advantages this novel approach features include: (1) accurate scheduling information when assigning, and, (2) accurate information of the cluster assignment constraints imposed by scheduling decisions. We have implemented and evaluated the proposed scheme with superblocks extracted from SpecInt95 and MediaBench. The results show that this approach produces better schedules than the previous state-ofthe- art. Speed-ups are up to 15%, with average speedups ranging from 2.5% (2-Clusters) to 9.5% (4- Clusters).