Register aware scheduling for distributed cache clustered architecture

Authors:
Zhong Wang;Xiaobo Sharon Hu;Edwin H.-M. Sha
Affiliations:
University of Notre Dame, Notre Dame IN;University of Notre Dame, Notre Dame IN;University of Texas at Dallas, Richardson TX
Venue:
ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Year:
2003

Citing 20
Cited 2

Cache coherence protocols: evaluation using a multiprocessor simulation model

ACM Transactions on Computer Systems (TOCS)
An efficient K-way graph partitioning algorithm for task allocation in parallel computing systems

ISCI '90 Proceedings of the first international conference on systems integration on Systems integration '90
Software pipelining: an evaluation of enhanced pipelining

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Iterative modulo scheduling: an algorithm for software pipelining loops

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Cache miss equations: an analytical representation of cache misses

ICS '97 Proceedings of the 11th international conference on Supercomputing
Data transformations for eliminating conflict misses

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Unified assign and schedule: a new approach to scheduling for clustered register file microarchitectures

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Lx: a technology platform for customizable VLIW embedded processing

Proceedings of the 27th annual international symposium on Computer architecture
Modulo scheduling for a fully-distributed clustered VLIW architecture

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
RS-FDRA: a register sensitive software pipelining algorithm for embedded VLIW processors

Proceedings of the ninth international symposium on Hardware/software codesign
High-quality operation binding for clustered VLIW datapaths

Proceedings of the 38th annual Design Automation Conference
Instruction scheduling for clustered VLIW architectures

ISSS '00 Proceedings of the 13th international symposium on System synthesis
Combined partitioning and data padding for scheduling multiple loop nests

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Modulo scheduling with integrated register spilling for clustered VLIW architectures

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
CALiBeR: a software pipelining algorithm for clustered embedded VLIW processors

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
On Uniformization of Affine Dependence Algorithms

IEEE Transactions on Computers
Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing

MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
CARS: A New Code Generation Framework for Clustered ILP Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Register-Sensitive Software Pipelining

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Rotation scheduling: a loop pipelining algorithm

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

Design principles for a virtual multiprocessor

Proceedings of the 2007 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
Loop Distribution and Fusion with Timing and Code Size Optimization

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Increasing wire delays have become a serious problem for sophisticated VLSI designs. Clustered architecture offers a promising alternative to alleviate the problem. In the clustered architecture, the cache, register file and function units are all partitioned into clusters such that short CPU cycle time can be achieved. A key challenge is the arrangement of inter-cluster communication. In this paper, we present a novel algorithm for scheduling inter-cluster communication operations. Our algorithm achieves better register resource utilization than the previous methods. By judiciously putting the selected spilled variables into their corresponding consumer's local cache, the costly cross-cache transfer is minimized. Therefore, the distributed caches are used more efficiently and the register constraint can be satisfied without compromising the schedule performance. The experiments shows that our technique outperforms the existing cluster-oriented schedulers.