Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning

Authors:
Alex Aletà;Josep M. Codina;F. Jesús Sánchez;Antonio González;David R. Kaeli
Affiliations:
-;-;-;-;-
Venue:
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Year:
2002

Citing 30
Cited 13

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Circular scheduling: a new technique to perform software pipelining

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Partitioned register files for VLIWs: a preliminary analysis of tradeoffs

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Lifetime-sensitive modulo scheduling

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A multilevel algorithm for partitioning graphs

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Optimum modulo schedules for minimum register requirements

ICS '95 Proceedings of the 9th international conference on Supercomputing
Stage scheduling: a technique to reduce the register requirements of a modulo schedule

Proceedings of the 28th annual international symposium on Microarchitecture
Cache sensitive modulo scheduling

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Modulo Scheduling with Reduced Register Pressure

IEEE Transactions on Computers
Effective cluster assignment for modulo scheduling

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Lx: a technology platform for customizable VLIW embedded processing

Proceedings of the 27th annual international symposium on Computer architecture
Improved spill code generation for software pipelined loops

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Modulo scheduling for a fully-distributed clustered VLIW architecture

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Lifetime-Sensitive Modulo Scheduling in a Production Environment

IEEE Transactions on Computers
A comparative study of modulo scheduling techniques

ICS '02 Proceedings of the 16th international conference on Supercomputing
An interleaved cache clustered VLIW processor

ICS '02 Proceedings of the 16th international conference on Supercomputing
Graph-partitioning based instruction scheduling for clustered processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling with integrated register spilling for clustered VLIW architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
The TigerSHARC DSP Architecture

IEEE Micro
A Unified Modulo Scheduling and Register Allocation Technique for Clustered Processors

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
A linear-time heuristic for improving network partitions

DAC '82 Proceedings of the 19th Design Automation Conference
The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures

ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
CARS: A New Code Generation Framework for Clustered ILP Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Partitioned Schedules for Clustered VLIW Architectures

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Implementation of algorithms for maximum matching on nonbipartite graphs.

Implementation of algorithms for maximum matching on nonbipartite graphs.
An Approach to Scientific Array Processing: The Architectural Design of the AP-120B/FPS-164 Family

Computer

Region-based hierarchical operation partitioning for multicluster processors

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Increasing the number of effective registers in a low-power processor using a windowed register file

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Instruction Replication for Clustered Microarchitectures

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Removing communications in clustered microarchitectures through instruction replication

ACM Transactions on Architecture and Code Optimization (TACO)
Cost-Sensitive Partitioning in an Architecture Synthesis System for Multicluster Processors

IEEE Micro
Automatic data partitioning for the agere payload plus network processor

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Demystifying on-the-fly spill code

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Partitioning Variables across Register Windows to Reduce Spill Code in a Low-Power Processor

IEEE Transactions on Computers
Variable-Based Multi-module Data Caches for Clustered VLIW Processors

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Inter-cluster communication in VLIW architectures

ACM Transactions on Architecture and Code Optimization (TACO)
Virtual Cluster Scheduling Through the Scheduling Graph

Proceedings of the International Symposium on Code Generation and Optimization
Heterogeneous Clustered VLIW Microarchitectures

Proceedings of the International Symposium on Code Generation and Optimization
An algorithm to improve parallelism in distributed systems using asynchronous calls

PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents a new modulo scheduling algorithm for clustered microarchitectures. The main feature of the proposed scheme is that the assignment of instructions to clusters is done by means of graph partitioning algorithms that are guided by a pseudo-scheduler. Thispseudo-scheduler is a simplified version of the full instruction scheduler and estimates key constraints that would be encountered in the final schedule.The final scheduling process is bi-directional and includes on-the-fly spill code generation. The proposed scheme is evaluated against previous scheduling approaches using the SPECfp95 benchmark suite. Our modeling results show that better schedules are obtained for most programs across a range of different architectures. For a 4-cluster VLIW architecture with 32 registers and a 2-cycle inter-cluster communication delay we obtain an average speedup of 38.5%.