Cache coherence protocols: evaluation using a multiprocessor simulation model
ACM Transactions on Computer Systems (TOCS)
An efficient K-way graph partitioning algorithm for task allocation in parallel computing systems
ISCI '90 Proceedings of the first international conference on systems integration on Systems integration '90
Software pipelining: an evaluation of enhanced pipelining
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Cache miss equations: an analytical representation of cache misses
ICS '97 Proceedings of the 11th international conference on Supercomputing
Data transformations for eliminating conflict misses
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Lx: a technology platform for customizable VLIW embedded processing
Proceedings of the 27th annual international symposium on Computer architecture
Modulo scheduling for a fully-distributed clustered VLIW architecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
RS-FDRA: a register sensitive software pipelining algorithm for embedded VLIW processors
Proceedings of the ninth international symposium on Hardware/software codesign
High-quality operation binding for clustered VLIW datapaths
Proceedings of the 38th annual Design Automation Conference
Instruction scheduling for clustered VLIW architectures
ISSS '00 Proceedings of the 13th international symposium on System synthesis
Combined partitioning and data padding for scheduling multiple loop nests
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Modulo scheduling with integrated register spilling for clustered VLIW architectures
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
CALiBeR: a software pipelining algorithm for clustered embedded VLIW processors
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
On Uniformization of Affine Dependence Algorithms
IEEE Transactions on Computers
MICRO 14 Proceedings of the 14th annual workshop on Microprogramming
CARS: A New Code Generation Framework for Clustered ILP Processors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Register-Sensitive Software Pipelining
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Rotation scheduling: a loop pipelining algorithm
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Design principles for a virtual multiprocessor
Proceedings of the 2007 annual research conference of the South African institute of computer scientists and information technologists on IT research in developing countries
Loop Distribution and Fusion with Timing and Code Size Optimization
Journal of Signal Processing Systems
Hi-index | 0.00 |
Increasing wire delays have become a serious problem for sophisticated VLSI designs. Clustered architecture offers a promising alternative to alleviate the problem. In the clustered architecture, the cache, register file and function units are all partitioned into clusters such that short CPU cycle time can be achieved. A key challenge is the arrangement of inter-cluster communication. In this paper, we present a novel algorithm for scheduling inter-cluster communication operations. Our algorithm achieves better register resource utilization than the previous methods. By judiciously putting the selected spilled variables into their corresponding consumer's local cache, the costly cross-cache transfer is minimized. Therefore, the distributed caches are used more efficiently and the register constraint can be satisfied without compromising the schedule performance. The experiments shows that our technique outperforms the existing cluster-oriented schedulers.