Loop optimization in register-transfer scheduling for DSP-systems
DAC '89 Proceedings of the 26th ACM/IEEE Design Automation Conference
Lifetime-sensitive modulo scheduling
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Iterative modulo scheduling: an algorithm for software pipelining loops
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Minimizing register requirements under resource-constrained rate-optimal software pipelining
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Resource-Constrained Software Pipelining
IEEE Transactions on Parallel and Distributed Systems
Stage scheduling: a technique to reduce the register requirements of a modulo schedule
Proceedings of the 28th annual international symposium on Microarchitecture
Minimizing register requirements of a modulo schedule via optimum stage scheduling
International Journal of Parallel Programming
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
An effective methodology for functional pipelining
ICCAD '92 Proceedings of the 1992 IEEE/ACM international conference on Computer-aided design
RS-FDRA: a register sensitive software pipelining algorithm for embedded VLIW processors
Proceedings of the ninth international symposium on Hardware/software codesign
High-quality operation binding for clustered VLIW datapaths
Proceedings of the 38th annual Design Automation Conference
FDRA: a software-pipelining algorithm for embedded VLIW processors
ISSS '00 Proceedings of the 13th international symposium on System synthesis
Instruction scheduling for clustered VLIW architectures
ISSS '00 Proceedings of the 13th international symposium on System synthesis
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Swing Modulo Scheduling: A Lifetime-Sensitive Approach
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
A systolic array optimizing compiler
A systolic array optimizing compiler
Rotation scheduling: a loop pipelining algorithm
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Cluster assignment for high-performance embedded VLIW processors
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A scalable wide-issue clustered VLIW with a reconfigurable interconnect
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
Register aware scheduling for distributed cache clustered architecture
ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Time-constrained scheduling of large pipelined datapaths
Journal of Systems Architecture: the EUROMICRO Journal
Application driven embedded system design: a face recognition case study
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Integration, the VLSI Journal
Compiler-driven leakage energy reduction in banked register files
PATMOS'06 Proceedings of the 16th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Hi-index | 0.00 |
In this paper we describe a software pipelining framework, CALiBeR (Cluster Aware Load Balancing Retiming Algorithm), suitable for compilers targeting clustered embedded VLIW processors. CALiBeR can be effectively used by embedded system designers to explore different code optimization alternatives, i.e., can assist the generation of high-quality customized retiming solutions for desired program memory size and throughput requirements, while minimizing register pressure. An extensive set of experimental results is presented, considering several representative benchmark loop kernels and a wide variety of clustered datapath configurations, demonstrating that our algorithm compares favorably with one of the best state-of-the-art algorithms, achieving up to 50% improvement in performance and up to 47% improvement in register requirements.