CARS: A New Code Generation Framework for Clustered ILP Processors

Authors:
Krishnan Kailas;Ashok Agrawala;Kemal Ebcioglu
Affiliations:
-;-;-
Venue:
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Year:
2001

Citing 0
Cited 49

Affinity-based cluster assignment for unrolled loops

ICS '02 Proceedings of the 16th international conference on Supercomputing
An interleaved cache clustered VLIW processor

ICS '02 Proceedings of the 16th international conference on Supercomputing
Graph-partitioning based instruction scheduling for clustered processors

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Modulo scheduling with integrated register spilling for clustered VLIW architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Cluster assignment for high-performance embedded VLIW processors

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Hierarchical Interconnects for On-Chip Clustering

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Exploiting Pseudo-Schedules to Guide Data Dependence Graph Partitioning

Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
A Register File Architecture and Compilation Scheme for Clustered ILP Processors

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Convergent scheduling

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Effective instruction scheduling techniques for an interleaved cache clustered VLIW processor

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Local scheduling techniques for memory coherence in a clustered VLIW processor with a distributed data cache

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Region-based hierarchical operation partitioning for multicluster processors

PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Inter-Cluster Communication Models for Clustered VLIW Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Cluster assignment of global values for clustered VLIW processors

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Flexible Compiler-Managed L0 Buffers for Clustered VLIW Processors

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Integrated temporal and spatial scheduling for extended operand clustered VLIW processors

Proceedings of the 1st conference on Computing frontiers
FLASH: Foresighted Latency-Aware Scheduling Heuristic for Processors with Customized Datapaths

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Static Placement, Dynamic Issue (SPDI) Scheduling for EDGE Architectures

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Evaluation of Bus Based Interconnect Mechanisms in Clustered VLIW Architectures

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Distributed Data Cache Designs for Clustered VLIW Processors

IEEE Transactions on Computers
Scalability Aspects of Instruction Distribution Algorithms for Clustered Processors

IEEE Transactions on Parallel and Distributed Systems
Instruction Replication for Reducing Delays Due to Inter-PE Communication Latency

IEEE Transactions on Computers
A Criticality Analysis of Clustering in Superscalar Processors

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Software and hardware techniques to optimize register file utilization in VLIW architectures

International Journal of Parallel Programming
Register aware scheduling for distributed cache clustered architecture

ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Compiler-directed Data Partitioning for Multicluster Processors

Proceedings of the International Symposium on Code Generation and Optimization
A spatial path scheduling algorithm for EDGE architectures

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Compiler-assisted leakage energy optimization for clustered VLIW architectures

EMSOFT '06 Proceedings of the 6th ACM & IEEE International conference on Embedded software
Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Inter-cluster communication in VLIW architectures

ACM Transactions on Architecture and Code Optimization (TACO)
Virtual Cluster Scheduling Through the Scheduling Graph

Proceedings of the International Symposium on Code Generation and Optimization
INTACTE: an interconnect area, delay, and energy estimation tool for microarchitectural explorations

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Modulo scheduling for highly customized datapaths to increase hardware reusability

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Optimal vs. heuristic integrated code generation for clustered VLIW architectures

SCOPES '08 Proceedings of the 11th international workshop on Software & compilers for embedded systems
Register Bank Assignment for Spatially Partitioned Processors

Languages and Compilers for Parallel Computing
Convergent Compilation Applied to Loop Unrolling

Transactions on High-Performance Embedded Architectures and Compilers I
Evaluation of bus based interconnect mechanisms in clustered VLIW architectures

International Journal of Parallel Programming
Compiler-assisted instruction decoder energy optimization for clustered VLIW architectures

HiPC'07 Proceedings of the 14th international conference on High performance computing
Compiler-assisted power optimization for clustered VLIW architectures

Parallel Computing
An efficient heuristic for instruction scheduling on clustered vliw processors

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Exploring energy-performance trade-offs for heterogeneous interconnect clustered VLIW processors

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Integrating a new cluster assignment and scheduling algorithm into an experimental retargetable code generation framework

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Code generation for STA architecture

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Integrated Code Generation for Loops

ACM Transactions on Embedded Computing Systems (TECS)
WCET-aware re-scheduling register allocation for real-time embedded systems with clustered VLIW architecture

Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Compiler-assisted energy optimization for clustered VLIW processors

Journal of Parallel and Distributed Computing
A general constraint-centric scheduling framework for spatial architectures

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
LUCAS: latency-adaptive unified cluster assignment and instruction scheduling

Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
CAeSaR: unified cluster-assignment scheduling and communication reuse for clustered VLIW processors

Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Abstract: Clustered ILP processors are characterized by a large number of non-centralized on-chip resources grouped into clusters. Traditional code generation schemes for these processors consist of multiple phases for cluster assignment, register allocation and instruction scheduling. Most of these approaches need additional re-scheduling phases because they often do not impose finite resource constraints in all phases of code generation.These phase-ordered solutions have several drawbacks,resulting in the generation of poor performance code. Moreover, the iterative/back-tracking algorithms used in some of these schemes have large running times. In this paper we present CARS, a code generation framework for Clustered ILP processors, which combines the cluster assignment, register allocation, and instruction scheduling phases into a single code generation phase, thereby eliminating the problems associated with phase-ordered solutions. The CARS algorithm explicitly takes into account all the resource constraints at each cluster scheduling step to reduce spilling and to avoid iterative re-scheduling steps. We also present a new on-the-fly register allocation scheme developed for CARS. We describe an implementation of the proposed code generation framework and the results of a performance evaluation study using the SPEC95/2000 and MediaBench benchmarks.