A constraint programming approach for integrated spatial and temporal scheduling for clustered architectures

Authors:
Mirza Beg;Peter van Beek
Affiliations:
University of Waterloo;University of Waterloo
Venue:
ACM Transactions on Embedded Computing Systems (TECS)
Year:
2013

Citing 35
Cited 0

Bulldog: a compiler for VLSI architectures

Bulldog: a compiler for VLSI architectures
Instruction scheduling in the TOBEY compiler

IBM Journal of Research and Development
A multilevel algorithm for partitioning graphs

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Effective cluster assignment for modulo scheduling

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Space-time scheduling of instruction-level parallelism on a raw machine

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Instruction scheduling for clustered VLIW architectures

ISSS '00 Proceedings of the 13th international symposium on System synthesis
Cluster assignment for high-performance embedded VLIW processors

ACM Transactions on Design Automation of Electronic Systems (TODAES)
A Unified Modulo Scheduling and Register Allocation Technique for Clustered Processors

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Efficient Interconnects for Clustered Microarchitectures

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Code Partitioning in Decoupled Compilers

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Convergent scheduling

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Region-based hierarchical operation partitioning for multicluster processors

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Instruction Scheduling for Clustered VLIW DSPs

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Integrated temporal and spatial scheduling for extended operand clustered VLIW processors

Proceedings of the 1st conference on Computing frontiers
Balanced graph partitioning

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Optimal Superblock Scheduling Using Enumeration

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Data-Dependency Graph Transformations for Instruction Scheduling

Journal of Scheduling
Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors

IEEE Transactions on Parallel and Distributed Systems
Compiler-directed Data Partitioning for Multicluster Processors

Proceedings of the International Symposium on Code Generation and Optimization
A survey of research and practices of Network-on-chip

ACM Computing Surveys (CSUR)
Optimal integrated code generation for VLIW architectures: Research Articles

Concurrency and Computation: Practice & Experience - 10th International Workshop on Compilers for Parallel Computers (CPC 2003)
Data-Dependency Graph Transformations for Superblock Scheduling

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Handbook of Constraint Programming (Foundations of Artificial Intelligence)

Handbook of Constraint Programming (Foundations of Artificial Intelligence)
Inter-cluster communication in VLIW architectures

ACM Transactions on Architecture and Code Optimization (TACO)
Research Challenges for On-Chip Interconnection Networks

IEEE Micro
Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Pragmatic integrated scheduling for clustered VLIW architectures

Software—Practice & Experience
An Application of Constraint Programming to Superblock Instruction Scheduling

CP '08 Proceedings of the 14th international conference on Principles and Practice of Constraint Programming
Integrated Modulo Scheduling for Clustered VLIW Architectures

HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
AGAMOS: A Graph-Based Approach to Modulo Scheduling for Clustered Microarchitectures

IEEE Transactions on Computers
Learning Heuristics for the Superblock Instruction Scheduling Problem

IEEE Transactions on Knowledge and Data Engineering
FCCM: A Novel Inter-Core Communication Mechanism in Multi-Core Platform

ICISE '09 Proceedings of the 2009 First IEEE International Conference on Information Science and Engineering
A Constraint Programming Approach for Instruction Assignment

INTERACT '11 Proceedings of the 2011 15th Workshop on Interaction between Compilers and Computer Architectures
Optimal integrated VLIW code generation with integer linear programming

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many embedded processors use clustering to scale up instruction-level parallelism in a cost-effective manner. In a clustered architecture, the registers and functional units are partitioned into smaller units and clusters communicate through register-to-register copy operations. Texas Instruments, for example, has a series of architectures for embedded processors which are clustered. Such an architecture places a heavier burden on the compiler, which must now assign instructions to clusters (spatial scheduling), assign instructions to cycles (temporal scheduling), and schedule copy operations to move data between clusters. We consider instruction scheduling of local blocks of code on clustered architectures to improve performance. Scheduling for space and time is known to be a hard problem. Previous work has proposed greedy approaches based on list scheduling to simultaneously perform spatial and temporal scheduling and phased approaches based on first partitioning a block of code to do spatial assignment and then performing temporal scheduling. Greedy approaches risk making mistakes that are then costly to recover from, and partitioning approaches suffer from the well-known phase ordering problem. In this article, we present a constraint programming approach for scheduling instructions on clustered architectures. We employ a problem decomposition technique that solves spatial and temporal scheduling in an integrated manner. We analyze the effect of different hardware parameters—such as the number of clusters, issue-width, and intercluster communication cost—on application performance. We found that our approach was able to achieve an improvement of up to 26%, on average, over a state-of-the-art technique on superblocks from SPEC 2000 benchmarks.