Force-directed scheduling in automatic data path synthesis
DAC '87 Proceedings of the 24th ACM/IEEE Design Automation Conference
The priority-based coloring approach to register allocation
ACM Transactions on Programming Languages and Systems (TOPLAS)
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Lx: a technology platform for customizable VLIW embedded processing
Proceedings of the 27th annual international symposium on Computer architecture
Graph-partitioning based instruction scheduling for clustered processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Cluster assignment for high-performance embedded VLIW processors
ACM Transactions on Design Automation of Electronic Systems (TODAES)
The TigerSHARC DSP Architecture
IEEE Micro
Inter-Cluster Communication Models for Clustered VLIW Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Instruction Scheduling for Clustered VLIW DSPs
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
CARS: A New Code Generation Framework for Clustered ILP Processors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Compiler-assisted leakage energy optimization for clustered VLIW architectures
EMSOFT '06 Proceedings of the 6th ACM & IEEE International conference on Embedded software
Inter-cluster communication in VLIW architectures
ACM Transactions on Architecture and Code Optimization (TACO)
INTACTE: an interconnect area, delay, and energy estimation tool for microarchitectural explorations
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Optimal vs. heuristic integrated code generation for clustered VLIW architectures
SCOPES '08 Proceedings of the 11th international workshop on Software & compilers for embedded systems
Compiler-assisted instruction decoder energy optimization for clustered VLIW architectures
HiPC'07 Proceedings of the 14th international conference on High performance computing
Compiler-assisted power optimization for clustered VLIW architectures
Parallel Computing
Exploring energy-performance trade-offs for heterogeneous interconnect clustered VLIW processors
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Integrated Code Generation for Loops
ACM Transactions on Embedded Computing Systems (TECS)
Compiler-assisted energy optimization for clustered VLIW processors
Journal of Parallel and Distributed Computing
SCRF: a hybrid register file architecture
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
Centralized register file architectures scale poorly in terms of clock rate, chip area, and power consumption and are thus not suitable for consumer electronic devices. The consequence is the emergence of architectures having many interconnected clusters each with a separate register file and a few functional units. Among the many inter-cluster communication models proposed, the extended operand model extends some of operand fields of instruction with a cluster specifier and allows an instruction to read some of the operands from other clusters without any extra cost.Scheduling for clustered processors involves spatial concerns (where to schedule) as well as temporal concerns (when to schedule). A scheduler is responsible for resolving the conflicting requirements of aggressively exploiting the parallelism offered by hardware and limiting the communication among clusters to available slots. This paper proposes an integrated spatial and temporal scheduling algorithm for extended operand clustered VLIW processors and evaluates its effectiveness in improving the run time performance of the code without code size penalty.