Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Architectural retiming: pipelining latency-constrained circuits
DAC '96 Proceedings of the 33rd annual Design Automation Conference
Proceedings of the 38th annual Design Automation Conference
High-level automatic pipelining for sequential circuits
Proceedings of the 14th international symposium on Systems synthesis
The optimum pipeline depth for a microprocessor
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Synchronous Interlocked Pipelines
ASYNC '02 Proceedings of the 8th International Symposium on Asynchronus Circuits and Systems
Efficient design space exploration of high performance embedded out-of-order processors
Proceedings of the conference on Design, automation and test in Europe: Proceedings
Synthesis of synchronous elastic architectures
Proceedings of the 43rd annual Design Automation Conference
Performance analysis of concurrent systems with early evaluation
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Synchronous elastic circuits with early evaluation and token counterflow
Proceedings of the 44th annual Design Automation Conference
Correct-by-construction microarchitectural pipelining
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Retiming and recycling for elastic systems with early evaluation
Proceedings of the 46th Annual Design Automation Conference
Theory of latency-insensitive design
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Microarchitectural Transformations Using Elasticity
ACM Journal on Emerging Technologies in Computing Systems (JETC)
Hi-index | 0.00 |
This paper presents a method for automatic microarchitectural pipelining of systems with loops. The original specification is pipelined by performing provably-correct transformations including conversion to a synchronous elastic form, early evaluation, inserting empty buffers, anti-tokens, and retiming. The design exploration is done by solving an optimization problem followed by simulation of solutions. The method is explained on a DLX microprocessor example. The impact of different microarchitectural parameters on the performance is analyzed.