Static Rate-Optimal Scheduling of Iterative Data-Flow Programs Via Optimum Unfolding

Authors:
Keshab K. Parhi;David G. Messerschmitt
Affiliations:
-;-
Venue:
IEEE Transactions on Computers
Year:
1991

Citing 7
Cited 60

Static scheduling of synchronous data flow programs for digital signal processing

IEEE Transactions on Computers
Preemptive scheduling under time and resource constraints

IEEE Transactions on Computers - Special Issue on Real-Time Systems
Performance analysis and optimization of VLSI dataflow arrays

Journal of Parallel and Distributed Computing
Scheduling algorithms for hard real-time systems: a brief survey

Tutorial: hard real-time systems
Scheduling Parallel Computations

Journal of the ACM (JACM)
Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment

Journal of the ACM (JACM)
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness

On the Assignment Problem of Arbitrary Process Systems to Heterogeneous Distributed Computer Systems

IEEE Transactions on Computers
High-level synthesis of scalable architectures for IIR filters using multichip modules

DAC '93 Proceedings of the 30th international Design Automation Conference
Module selection and data format conversion for cost-optimal DSP synthesis

ICCAD '94 Proceedings of the 1994 IEEE/ACM international conference on Computer-aided design
Data flow partitioning for clock period and latency minimization

DAC '94 Proceedings of the 31st annual Design Automation Conference
Architectural retiming: pipelining latency-constrained circuits

DAC '96 Proceedings of the 33rd annual Design Automation Conference
Assignment of storage values to sequential read-write memories

EURO-DAC '96/EURO-VHDL '96 Proceedings of the conference on European design automation
Determining the Order of Processor Transactions in StaticallyScheduled Multiprocessors

Journal of VLSI Signal Processing Systems
Performance-driven partitioning using retiming and replication

ICCAD '93 Proceedings of the 1993 IEEE/ACM international conference on Computer-aided design
High throughput pipelined data path synthesis by conserving the regularity of nested loops

ICCAD '93 Proceedings of the 1993 IEEE/ACM international conference on Computer-aided design
Power optimization using divide-and-conquer techniques for minimization of the number of operations

ICCAD '97 Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design
A tool for partitioning and pipelined scheduling of hardware-software systems

Proceedings of the 11th international symposium on System synthesis
Low-Energy Digit-Serial/Parallel Finite Field Multipliers

Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
Heuristic Loop-Based Scheduling and Allocation for DSP Synthesis with Heterogeneous Functional Units

Journal of VLSI Signal Processing Systems
An effective methodology for functional pipelining

ICCAD '92 Proceedings of the 1992 IEEE/ACM international conference on Computer-aided design
Power optimization using divide-and-conquer techniques for minimization of the number of operations

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Probabilistic Loop Scheduling for Applications with Uncertain Execution Time

IEEE Transactions on Computers
Throughput optimization of general non-linear computations

ICCAD '99 Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design
Properties and Algorithms for Unfolding of Probabilistic Data-Flow Graphs

Journal of VLSI Signal Processing Systems
A Specification Refinement Methodology for Power Efficient Partitioning of Data-Dominated Algorithms Within Performance Constraints

Journal of VLSI Signal Processing Systems
Synthesis of low power folded programmable coefficient FIR digital filters (short paper)

ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
Minimizing Buffer Requirements under Rate-Optimal Schedule in Regular Dataflow Networks

Journal of VLSI Signal Processing Systems
Prediction of Performance and Processor Requirements in Real-Time Data Flow Architectures

IEEE Transactions on Parallel and Distributed Systems
Heuristic Algorithms for Scheduling Iterative Task Computations on Distributed Memory Machines

IEEE Transactions on Parallel and Distributed Systems
Scheduling Data-Flow Graphs via Retiming and Unfolding

IEEE Transactions on Parallel and Distributed Systems
Self-Timed Resynchronization: A Post-Optimization for Static Multiprocessor Schedules

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
CPR: Mixed Task and Data Parallel Scheduling for Distributed Systems

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Platform Independent Parallelising Tool Based on Graph Theoretic Models

VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
Low-power VLSI synthesis of DSP systems

Integration, the VLSI Journal
Assigning service requests in voice-over-internet gateway multiprocessors

Computers and Operations Research
Combining Extended Retiming and Unfolding for Rate-Optimal Graph Transformation

Journal of VLSI Signal Processing Systems
On exploring inter-iteration parallelism within rate-balanced multirate multidimensional DSP algorithms

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Combining extended retiming and unfolding for rate-optimal graph transformation

Journal of VLSI Signal Processing Systems
On multiple-voltage high-level synthesis using algorithmic transformations

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Verification method of dataflow algorithms in high-level synthesis

Journal of Systems and Software
Maximize Parallelism Minimize Overhead for Nested Loops via Loop Striping

Journal of VLSI Signal Processing Systems
High-level synthesis of DSP applications using adaptive negative cycle detection

EURASIP Journal on Applied Signal Processing
A new strategy for multiprocessor scheduling of cyclic task graphs

International Journal of High Performance Computing and Networking
Time-constrained loop scheduling with minimal resources

Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
Orchestrating the execution of stream programs on multicore platforms

Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A Low Complexity Reconfigurable DCT Architecture to Trade off Image Quality for Power Consumption

Journal of Signal Processing Systems
Algorithmic transformations and peak power constraint applied to multiple-voltage low-power VLSI signal processing

WSEAS Transactions on Signal Processing
Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
CellSs: Scheduling techniques to better exploit memory hierarchy

Scientific Programming - High Performance Computing with the Cell Broadband Engine
High performance architecture of an application specific processor for the H.264 deblocking filter

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
On compile-time evaluation of process partitioning transformations for Kahn process networks

CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
High level synthesis of integrated heterogeneous pipelined processing elements for DSP applications

Computers and Electrical Engineering
Parallel image processing with the block data parallel architecture

IBM Journal of Research and Development
A fast spline curve rendering accelerator architecture

IEEE Transactions on Circuits and Systems II: Express Briefs
Efficient retiming and unfolding

ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: plenary, special, audio, underwater acoustics, VLSI, neural networks - Volume I
Minimizing buffer requirements for throughput constrained parallel execution of synchronous dataflow graph

Proceedings of the 16th Asia and South Pacific Design Automation Conference
Rate-optimal DSP synthesis by pipeline and minimum unfolding

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A constraint based approach to cyclic RCPSP

CP'11 Proceedings of the 17th international conference on Principles and practice of constraint programming
Loop striping: maximize parallelism for nested loops

EUC'06 Proceedings of the 2006 international conference on Embedded and Ubiquitous Computing
Stochastic DFS for multiprocessor scheduling of cyclic taskgraphs

PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
Global cyclic cumulative constraint

CPAIOR'12 Proceedings of the 9th international conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems
Throughput-constrained voltage and frequency scaling for real-time heterogeneous multiprocessors

Proceedings of the 28th Annual ACM Symposium on Applied Computing
Distributed programming framework for fast iterative optimization in networked cyber-physical systems

ACM Transactions on Embedded Computing Systems (TECS) - Special Section ESFH'12, ESTIMedia'11 and Regular Papers
StreaMorph: a case for synthesizing energy-efficient adaptive programs using high-level abstractions

Proceedings of the Eleventh ACM International Conference on Embedded Software
CROSS cyclic resource-constrained scheduling solver

Artificial Intelligence

Quantified Score

Hi-index	14.99

Visualization

Abstract

Rate-optimal compile-time multiprocessor scheduling of iterative dataflow programs suitable for real-time signal processing applications is discussed. It is shown that recursions or loops in the programs lead to an inherent lower bound on the achievable iteration period, referred to as the iteration bound. A multiprocessor schedule is rate-optimal if the iteration period equals the iteration bound. Systematic unfolding of iterative dataflow programs is proposed, and properties of unfolded dataflow programs are studied. Unfolding increases the number of tasks in a program, unravels the hidden concurrently in iterative dataflow programs, and can reduce the iteration period. A special class of iterative dataflow programs, referred to as perfect-rate programs, is introduced. Each loop in these programs has a single register. Perfect-rate programs can always be scheduled rate optimally (requiring no retiming or unfolding transformation). It is also shown that unfolding any program by an optimum unfolding factor transforms any arbitrary program to an equivalent perfect-rate program, which can then be scheduled rate optimally. This optimum unfolding factor for any arbitrary program is the least common multiple of the number of registers (or delays) in all loops and is independent of the node execution times. An upper bound on the number of processors for rate-optimal scheduling is given.