Cluster assignment of global values for clustered VLIW processors

Authors:
Andrei Terechko;Erwan Le Thénaff;Henk Corporaal
Affiliations:
Philips Research, Eindhoven, The Netherlands;Philips Research, Eindhoven, The Netherlands;Technical University Eindhoven, Eindhoven, The Netherlands
Venue:
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Year:
2003

Citing 19
Cited 5

Highly concurrent scalar processing

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Partitioned register file for TTAs

Proceedings of the 28th annual international symposium on Microarchitecture
Simulation/evaluation environment for a VLIW processor architecture

IBM Journal of Research and Development - Special issue: performance analysis and its impact on design
Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Space-time scheduling of instruction-level parallelism on a raw machine

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Lx: a technology platform for customizable VLIW embedded processing

Proceedings of the 27th annual international symposium on Computer architecture
Communication scheduling

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
High-quality operation binding for clustered VLIW datapaths

Proceedings of the 38th annual Design Automation Conference
Modulo scheduling with integrated register spilling for clustered VLIW architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
A Method for Register Allocation to Loops in Multiple Register File Architectures

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A Register File Architecture and Compilation Scheme for Clustered ILP Processors

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
An Architectural Overview of the Programmable Multimedia Processor, TM-1

COMPCON '96 Proceedings of the 41st IEEE International Computer Conference
Treegion Scheduling for Wide Issue Processors

HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Inter-Cluster Communication Models for Clustered VLIW Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Global Register Partitioning

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
TriMedia CPU64 Application Domain and Benchmark Suite

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
CARS: A New Code Generation Framework for Clustered ILP Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Implementing an experimental VLIW compiler

WCAE-3 '97 Proceedings of the 1997 workshop on Computer architecture education

Compiler-directed Data Partitioning for Multicluster Processors

Proceedings of the International Symposium on Code Generation and Optimization
Inter-cluster communication in VLIW architectures

ACM Transactions on Architecture and Code Optimization (TACO)
Virtual Cluster Scheduling Through the Scheduling Graph

Proceedings of the International Symposium on Code Generation and Optimization
Variable Partitioning and Scheduling for MPSoC with Virtually Shared Scratch Pad Memory

Journal of Signal Processing Systems
SCRF: a hybrid register file architecture

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper high-level language (HLL) variables that are alive in a whole HLL function, across multiple scheduling units, are termed as global values. Due to their long live ranges and, hence, large impact on the schedule, the global values require different compiler optimizations than local values, which span across only one scheduling unit. The instruction scheduler for a clustered ILP processor, which is responsible for cluster assignment of operations and variables, faces a difficult problem of assigning global values to clusters. Our study shows that trivial assignments (e.g. mapping all global values into one cluster) may result in a severe cycle count overhead relative to the unicluster of up to 26.3% for a four cluster VLIW machine. This paper presents three advanced algorithms for assigning global values to clusters based on multi-pass scheduling and affinity of variables. Furthermore, we measure performance of these algorithms on optimized multimedia C applications and assess quality of our algorithms by comparing them to a practical higher performance bound derived from a vast random search. Our algorithms reduce the execution time overhead of the best simple algorithm round-robin from 10.5% to 5.9% for the two cluster VLIW machine and from 17.3% to 14.12% for the four cluster VLIW machine.