Self-timed rings and their application to division
Self-timed rings and their application to division
Performance analysis based on timing simulation
DAC '94 Proceedings of the 31st annual Design Automation Conference
Four-phase micropipeline latch control circuits
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An optimal clock period selection method based on slack minimization criteria
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Minimum area retiming with equivalent initial states
ICCAD '97 Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Advanced compiler design and implementation
Advanced compiler design and implementation
On the optimization power of retiming and resynthesis transformations
Proceedings of the 1998 IEEE/ACM international conference on Computer-aided design
Resynthesis and peephole transformations for the optimization of large-scale asynchronous systems
Proceedings of the 39th annual Design Automation Conference
Proceedings of the 2000 IEEE/ACM international conference on Computer-aided design
Bounding Average Time Separations of Events in Stochastic Timed Petri Nets with Choice
ASYNC '99 Proceedings of the 5th International Symposium on Advanced Research in Asynchronous Circuits and Systems
ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
System-level scheduling on instruction cell based reconfigurable systems
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Leveraging protocol knowledge in slack matching
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Global critical path: a tool for system-level timing analysis
Proceedings of the 44th annual Design Automation Conference
Slack analysis in the system design loop
CODES+ISSS '08 Proceedings of the 6th IEEE/ACM/IFIP international conference on Hardware/Software codesign and system synthesis
Performance-driven clustering of asynchronous circuits
PATMOS'11 Proceedings of the 21st international conference on Integrated circuit and system design: power and timing modeling, optimization, and simulation
Hi-index | 0.00 |
We define operation chaining (op-chaining) as an optimization problem to determine the optimal pipeline depth for balancing performance against energy demands in pipelined asynchronous designs. Since there are no clock period requirements, asynchronous pipeline stages can have non-uniform latencies. We exploit this fact to coalesce several stages together thereby saving power and area due to the elimination of control-path resources from the pipeline. The trade-off is potentially reduced pipeline parallelism. In this paper, we formally define this optimization as a graph covering problem, which finds sub-graphs that will be synthesized as an opchained pipeline stage. We then define the solution space for provably correct solutions and present an algorithm to efficiently search this space. The search technique partitions the graph based on post-dominator relationships to find sub-graphs that are potential op-chain candidates. We use knowledge of the Global Critical Path (GCP) [13] to evaluate the performance impact of accepting a candidate sub-graph and formulate a heuristic cost function to model this trade-off. The algorithm has a quadratic-time complexity in the size of the dataflow graph. We have implemented this algorithm within an automated asynchronous synthesis toolchain [12]. Experimental evidence from applying the algorithm on several media processing kernels reveals that the average energy-delay and energy-delay-area products improve by about 1.4x and 1.8x respectively, with a maximum improvement of 5x and 18x.