Data structures and network algorithms
Data structures and network algorithms
Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays
IEEE Transactions on Computers
VLSI array processors
Loop skewing: the wavefront method revisited
International Journal of Parallel Programming
Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
Detecting cycles in dynamic graphs in polynomial time
STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Strongly polynomial-time and NC algorithms for detecting cycles in dynamic graphs
STOC '89 Proceedings of the twenty-first annual ACM symposium on Theory of computing
Loop quantization of unwinding done right
Proceedings of the 1st International Conference on Supercomputing
Loop optimization in register-transfer scheduling for DSP-systems
DAC '89 Proceedings of the 26th ACM/IEEE Design Automation Conference
Fine-grain parallelization and the wavefront method
Selected papers of the second workshop on Languages and compilers for parallel computing
Journal of VLSI Signal Processing Systems - Parallel processing on VLSI arrays
DAC '90 Proceedings of the 27th ACM/IEEE Design Automation Conference
Rotation scheduling: a loop pipelining algorithm
DAC '93 Proceedings of the 30th international Design Automation Conference
Scheduling and behavioral transformation for parallel systems
Scheduling and behavioral transformation for parallel systems
Loop pipelining for scheduling multi-dimensional systems via rotation
DAC '94 Proceedings of the 31st annual Design Automation Conference
An effective methodology for functional pipelining
ICCAD '92 Proceedings of the 1992 IEEE/ACM international conference on Computer-aided design
The parallel execution of DO loops
Communications of the ACM
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
Multidimensional Digital Signal Processing
Multidimensional Digital Signal Processing
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Schedule-Based Multi-Dimensional Retiming on Data Flow Graphs
Proceedings of the 8th International Symposium on Parallel Processing
Compaction-based parallelization
Compaction-based parallelization
A Systolic Design Methodology with Application toFull-Search Block-Matching Architectures
Journal of VLSI Signal Processing Systems
Optimizing Overall Loop Schedules Using Prefetching and Partitioning
IEEE Transactions on Parallel and Distributed Systems
Minimizing Average Schedule Length under Memory Constraints by Optimal Partitioning and Prefetching
Journal of VLSI Signal Processing Systems
Scheduling and partitioning for multiple loop nests
Proceedings of the 14th international symposium on Systems synthesis
Combined partitioning and data padding for scheduling multiple loop nests
CASES '01 Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems
Loop Scheduling and Partitions for Hiding Memory Latencies
Proceedings of the 12th international symposium on System synthesis
Data dependent loop scheduling based on genetic algorithms for distributed and shared memory systems
Journal of Parallel and Distributed Computing
General loop fusion technique for nested loops considering timing and code size
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Journal of Systems and Software - Special issue: Software engineering education and training
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
On combining iteration space tiling with data space tiling for scratch-pad memory systems
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Single-dimension software pipelining for multidimensional loops
ACM Transactions on Architecture and Code Optimization (TACO)
Energy minimization with loop fusion and multi-functional-unit scheduling for multidimensional DSP
Journal of Parallel and Distributed Computing
Timing optimization via nest-loop pipelining considering code size
Microprocessors & Microsystems
Optimal loop parallelization for maximizing iteration-level parallelism
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Energy-Aware Loop Parallelism Maximization for Multi-core DSP Architectures
GREENCOM-CPSCOM '10 Proceedings of the 2010 IEEE/ACM Int'l Conference on Green Computing and Communications & Int'l Conference on Cyber, Physical and Social Computing
Execution Time Optimization Using Delayed Multidimensional Retiming
DS-RT '12 Proceedings of the 2012 IEEE/ACM 16th International Symposium on Distributed Simulation and Real Time Applications
A direct method for optimal VLSI realization of deeply nested n-D loop problems
Microprocessors & Microsystems
Loop Transforming for Reducing Data Alignment on Multi-Core SIMD Processors
Journal of Signal Processing Systems
Hi-index | 0.00 |
Most scientific and Digital Signal Processing (DSP) applications are recursive or iterative. Transformation techniques are usually applied to get optimal execution rates in parallel and/or pipeline systems. The retiming technique is a common and valuable transformation tool in one-dimensional problems, when loops are represented by data flow graphs (DFGs). In this paper, uniform nested loops are modeled as multidimensional data flow graphs (MDFGs). Full parallelism of the loop body, i.e., all nodes in the MDFG executed in parallel, substantially decreases the overall computation time. It is well known that, for one-dimensional DFGs, retiming can not always achieve full parallelism. Other existing optimization techniques for nested loops also can not always achieve full parallelism. This paper shows an important and counter-intuitive result, which proves that we can always obtain full-parallelism for MDFGs with more than one dimension. This result is obtained by transforming the MDFG into a new structure. The restructuring process is based on a multidimensional retiming technique. The theory and two algorithms to obtain full parallelism are presented in this paper. Examples of optimization of nested loops and digital signal processing designs are shown to demonstrate the effectiveness of the algorithms.