Achieving Full Parallelism Using Multidimensional Retiming

Authors:
Nelson Luiz Passos;Edwin Hsing-Mean Sha
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1996

Citing 22
Cited 20

Data structures and network algorithms

Data structures and network algorithms
Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays

IEEE Transactions on Computers
VLSI array processors

VLSI array processors
Loop skewing: the wavefront method revisited

International Journal of Parallel Programming
Software pipelining: an effective scheduling technique for VLIW machines

PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Detecting cycles in dynamic graphs in polynomial time

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Strongly polynomial-time and NC algorithms for detecting cycles in dynamic graphs

STOC '89 Proceedings of the twenty-first annual ACM symposium on Theory of computing
Loop quantization of unwinding done right

Proceedings of the 1st International Conference on Supercomputing
Loop optimization in register-transfer scheduling for DSP-systems

DAC '89 Proceedings of the 26th ACM/IEEE Design Automation Conference
Fine-grain parallelization and the wavefront method

Selected papers of the second workshop on Languages and compilers for parallel computing
Numerical integration of partial differential equations using principles of multidimensional wave digital filters

Journal of VLSI Signal Processing Systems - Parallel processing on VLSI arrays
Percolation based synthesis

DAC '90 Proceedings of the 27th ACM/IEEE Design Automation Conference
Rotation scheduling: a loop pipelining algorithm

DAC '93 Proceedings of the 30th international Design Automation Conference
Scheduling and behavioral transformation for parallel systems

Scheduling and behavioral transformation for parallel systems
Loop pipelining for scheduling multi-dimensional systems via rotation

DAC '94 Proceedings of the 31st annual Design Automation Conference
An effective methodology for functional pipelining

ICCAD '92 Proceedings of the 1992 IEEE/ACM international conference on Computer-aided design
The parallel execution of DO loops

Communications of the ACM
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Multidimensional Digital Signal Processing

Multidimensional Digital Signal Processing
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Schedule-Based Multi-Dimensional Retiming on Data Flow Graphs

Proceedings of the 8th International Symposium on Parallel Processing
Compaction-based parallelization

Compaction-based parallelization

A Systolic Design Methodology with Application toFull-Search Block-Matching Architectures

Journal of VLSI Signal Processing Systems
Optimizing Overall Loop Schedules Using Prefetching and Partitioning

IEEE Transactions on Parallel and Distributed Systems
Minimizing Average Schedule Length under Memory Constraints by Optimal Partitioning and Prefetching

Journal of VLSI Signal Processing Systems
Scheduling and partitioning for multiple loop nests

Proceedings of the 14th international symposium on Systems synthesis
Combined partitioning and data padding for scheduling multiple loop nests

CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Loop Scheduling and Partitions for Hiding Memory Latencies

Proceedings of the 12th international symposium on System synthesis
Data dependent loop scheduling based on genetic algorithms for distributed and shared memory systems

Journal of Parallel and Distributed Computing
General loop fusion technique for nested loops considering timing and code size

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
A two-level scheduling method: an effective parallelizing technique for uniform nested loops on a DSP multiprocessor

Journal of Systems and Software - Special issue: Software engineering education and training
On exploring inter-iteration parallelism within rate-balanced multirate multidimensional DSP algorithms

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
On combining iteration space tiling with data space tiling for scratch-pad memory systems

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Single-dimension software pipelining for multidimensional loops

ACM Transactions on Architecture and Code Optimization (TACO)
Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP

Journal of Parallel and Distributed Computing
Timing optimization via nest-loop pipelining considering code size

Microprocessors & Microsystems
Optimal loop parallelization for maximizing iteration-level parallelism

CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Overhead-aware energy optimization for real-time streaming applications on multiprocessor System-on-Chip

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Energy-Aware Loop Parallelism Maximization for Multi-core DSP Architectures

GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Execution Time Optimization Using Delayed Multidimensional Retiming

DS-RT '12 Proceedings of the 2012 IEEE/ACM 16th International Symposium on Distributed Simulation and Real Time Applications
A direct method for optimal VLSI realization of deeply nested n-D loop problems

Microprocessors & Microsystems
Loop Transforming for Reducing Data Alignment on Multi-Core SIMD Processors

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. The retiming technique is a common and valuable transformation tool in one-dimensional problems, when loops are represented by data flow graphs (DFGs). In this paper, uniform nested loops are modeled as multidimensional data flow graphs (MDFGs). Full parallelism of the loop body, i.e., all nodes in the MDFG executed in parallel, substantially decreases the overall computation time. It is well known that, for one-dimensional DFGs, retiming can not always achieve full parallelism. Other existing optimization techniques for nested loops also can not always achieve full parallelism. This paper shows an important and counter-intuitive result, which proves that we can always obtain full-parallelism for MDFGs with more than one dimension. This result is obtained by transforming the MDFG into a new structure. The restructuring process is based on a multidimensional retiming technique. The theory and two algorithms to obtain full parallelism are presented in this paper. Examples of optimization of nested loops and digital signal processing designs are shown to demonstrate the effectiveness of the algorithms.