Principles of CMOS VLSI design: a systems perspective
Principles of CMOS VLSI design: a systems perspective
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Incremental tree height reduction for high level synthesis
DAC '91 Proceedings of the 28th ACM/IEEE Design Automation Conference
Using profile information to assist classic code optimizations
Software—Practice & Experience
Performance analysis and optimization of schedules for conditional and loop-intensive specifications
DAC '94 Proceedings of the 31st annual Design Automation Conference
Compiler transformations for high-performance computing
ACM Computing Surveys (CSUR)
DAC '96 Proceedings of the 33rd annual Design Automation Conference
Wavesched: a novel scheduling technique for control-flow intensive behavioral descriptions
ICCAD '97 Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design
Optimizing power using transformations
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Common-case computation: a high-level technique for power and performance optimization
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
System-level power optimization: techniques and tools
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Throughput optimization of general non-linear computations
ICCAD '99 Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design
Hi-index | 0.00 |
In this paper, we present an algorithm for the application of a general class of transformations to control-flow intensive behavioral descriptions. Our algorithm is based on the observation that incorporation of scheduling information can help guide the selection and application of candidate transformations, and significantly enhance the quality of the synthesized solution. The efficacy of the selected throughput and power optimizing transformations is enhanced by the ability of our algorithm to transcend basic blocks in the behavioral description. This ability is imparted to our algorithm by a general technique we have devised. Our system currently supports associativity, commutativity, distributivity, constant propagation, code motion, and loop unrolling. It is integrated with a scheduler which performs implicit loop unrolling and functional pipelining, and has the ability to parallelize the execution of independent iterative constructs whose bodies can share resources. Other transformations can easily be incorporated within the framework. We demonstrate the efficacy of our algorithm by applying it to several commonly available benchmarks. Upon synthesis, behaviors transformed by the application of our algorithm showed up to 6-fold improvement in throughput over an existing transformation algorithm, and up to 4.5-fold improvement in power over designs produced without the benefit of our algorithm.