ACM SIGPLAN Notices
Modulo scheduling for a fully-distributed clustered VLIW architecture
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Lifetime-Sensitive Modulo Scheduling in a Production Environment
IEEE Transactions on Computers
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
High-quality operation binding for clustered VLIW datapaths
Proceedings of the 38th annual Design Automation Conference
Instruction scheduling for clustered VLIW architectures
ISSS '00 Proceedings of the 13th international symposium on System synthesis
Graph-partitioning based instruction scheduling for clustered processors
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Cluster assignment for high-performance embedded VLIW processors
ACM Transactions on Design Automation of Electronic Systems (TODAES)
CALiBeR: a software pipelining algorithm for clustered embedded VLIW processors
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Dynamic Code Partitioning for Clustered Architectures
International Journal of Parallel Programming
The Effectiveness of Loop Unrolling for Modulo Scheduling in Clustered VLIW Architectures
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Removing communications in clustered microarchitectures through instruction replication
ACM Transactions on Architecture and Code Optimization (TACO)
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Virtual Cluster Scheduling Through the Scheduling Graph
Proceedings of the International Symposium on Code Generation and Optimization
Heterogeneous Clustered VLIW Microarchitectures
Proceedings of the International Symposium on Code Generation and Optimization
Data pipeline optimization for shared memory multiple-SIMD architecture
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Integrated Code Generation for Loops
ACM Transactions on Embedded Computing Systems (TECS)
Integrated modulo scheduling and cluster assignment for TI TMS320C64x+ architecture
Proceedings of the 11th Workshop on Optimizations for DSP and Embedded Systems
Hi-index | 0.00 |
Wide-issue ILP machines can be built using the VLIW approach as many of the hardware complexities found in superscalar processors can be transferred to the compiler. However, the scalability of VLIW architectures is still constrained by the size and number of ports of the register file required by a large number of functional units.Organizations composed by clusters of a few functional units and small private register files have been proposed to deal with this problem, an approach highly dependent on scheduling and partitioning strategies. This paper presents DMS, an algorithm that integrates modulo scheduling and code partitioning in a single procedure. Experimental results have shown the algorithm is effective for configurations up to 8 clusters, or even more when targeting vectorizable loops.