Loop pipelining for high-throughput stream computation using self-timed rings

Authors:
Gennette Gill;John Hansen;Montek Singh
Affiliations:
Univ. of North Carolina, Chapel Hill, NC;Univ. of North Carolina, Chapel Hill, NC;Univ. of North Carolina, Chapel Hill, NC
Venue:
Proceedings of the 2006 IEEE/ACM international conference on Computer-aided design
Year:
2006

Citing 10
Cited 1

Self-timed rings and their application to division

Self-timed rings and their application to division
Software pipelining

ACM Computing Surveys (CSUR)
The design and verification of a high-performance low-control-overhead asynchronous differential equation solver

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Efficient conditional operations for data-parallel architectures

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Transformations for the synthesis and optimization of asynchronous distributed control

Proceedings of the 38th annual Design Automation Conference
Optimizing compilers for modern architectures: a dependence-based approach

Optimizing compilers for modern architectures: a dependence-based approach
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Perfect Pipelining: A New Loop Parallelization Technique

ESOP '88 Proceedings of the 2nd European Symposium on Programming
The Design and Verification of A High-Performance Low-Control-Overhead Asynchronous Differential Equation Solver

ASYNC '97 Proceedings of the 3rd International Symposium on Advanced Research in Asynchronous Circuits and Systems
Spatial computation

Spatial computation

Performance estimation and slack matching for pipelined asynchronous architectures with choice

Proceedings of the 2008 IEEE/ACM International Conference on Computer-Aided Design

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a technique for increasing the throughput of stream processing architectures by removing the bottlenecks caused by loop structures. We implement loops as self-timed pipelined rings that can operate on multiple data sets concurrently. Our contribution includes a transformation algorithm which takes as input a high-level program and gives as output the structure of an optimized pipeline ring. Our technique handles nested loops and is further enhanced by loop unrolling. Simulations run on benchmark examples show a 1.3 to 4.9x speedup without unrolling and a 2.6 to 9.7x speedup with twofold loop unrolling.