Self-timed rings and their application to division
Self-timed rings and their application to division
ACM Computing Surveys (CSUR)
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Efficient conditional operations for data-parallel architectures
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Transformations for the synthesis and optimization of asynchronous distributed control
Proceedings of the 38th annual Design Automation Conference
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Perfect Pipelining: A New Loop Parallelization Technique
ESOP '88 Proceedings of the 2nd European Symposium on Programming
ASYNC '97 Proceedings of the 3rd International Symposium on Advanced Research in Asynchronous Circuits and Systems
Spatial computation
Performance estimation and slack matching for pipelined asynchronous architectures with choice
Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design
Hi-index | 0.00 |
We present a technique for increasing the throughput of stream processing architectures by removing the bottlenecks caused by loop structures. We implement loops as self-timed pipelined rings that can operate on multiple data sets concurrently. Our contribution includes a transformation algorithm which takes as input a high-level program and gives as output the structure of an optimized pipeline ring. Our technique handles nested loops and is further enhanced by loop unrolling. Simulations run on benchmark examples show a 1.3 to 4.9x speedup without unrolling and a 2.6 to 9.7x speedup with twofold loop unrolling.