An efficient heuristic for instruction scheduling on clustered vliw processors
CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Compiler supports for VLIW DSP processors with SIMD intrinsics
Concurrency and Computation: Practice & Experience
LUCAS: latency-adaptive unified cluster assignment and instruction scheduling
Proceedings of the 14th ACM SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 14.98 |
This paper presents AGAMOS, a technique to modulo schedule loops on clustered microarchitectures. The proposed scheme uses a multilevel graph partitioning strategy to distribute the workload among clusters and reduces the number of intercluster communications at the same time. Partitioning is guided by approximate schedules (i.e., pseudoschedules), which take into account all of the constraints that influence the final schedule. To further reduce the number of intercluster communications, heuristics for instruction replication are included. The proposed scheme is evaluated using the SPECfp95 programs. The described scheme outperforms a state-of-the-art scheduler for all programs and different cluster configurations. For some configurations, the speedup obtained when using this new scheme is greater than 40 percent, and for selected programs, performance can be more than doubled.