Static scheduling of synchronous data flow programs for digital signal processing
IEEE Transactions on Computers
Preemptive scheduling under time and resource constraints
IEEE Transactions on Computers - Special Issue on Real-Time Systems
Performance analysis and optimization of VLSI dataflow arrays
Journal of Parallel and Distributed Computing
Scheduling algorithms for hard real-time systems: a brief survey
Tutorial: hard real-time systems
Scheduling Parallel Computations
Journal of the ACM (JACM)
Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment
Journal of the ACM (JACM)
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
On the Assignment Problem of Arbitrary Process Systems to Heterogeneous Distributed Computer Systems
IEEE Transactions on Computers
High-level synthesis of scalable architectures for IIR filters using multichip modules
DAC '93 Proceedings of the 30th international Design Automation Conference
Module selection and data format conversion for cost-optimal DSP synthesis
ICCAD '94 Proceedings of the 1994 IEEE/ACM international conference on Computer-aided design
Data flow partitioning for clock period and latency minimization
DAC '94 Proceedings of the 31st annual Design Automation Conference
Architectural retiming: pipelining latency-constrained circuits
DAC '96 Proceedings of the 33rd annual Design Automation Conference
Assignment of storage values to sequential read-write memories
EURO-DAC '96/EURO-VHDL '96 Proceedings of the conference on European design automation
Determining the Order of Processor Transactions in StaticallyScheduled Multiprocessors
Journal of VLSI Signal Processing Systems
Performance-driven partitioning using retiming and replication
ICCAD '93 Proceedings of the 1993 IEEE/ACM international conference on Computer-aided design
High throughput pipelined data path synthesis by conserving the regularity of nested loops
ICCAD '93 Proceedings of the 1993 IEEE/ACM international conference on Computer-aided design
Power optimization using divide-and-conquer techniques for minimization of the number of operations
ICCAD '97 Proceedings of the 1997 IEEE/ACM international conference on Computer-aided design
A tool for partitioning and pipelined scheduling of hardware-software systems
Proceedings of the 11th international symposium on System synthesis
Low-Energy Digit-Serial/Parallel Finite Field Multipliers
Journal of VLSI Signal Processing Systems - Special issue on application specific systems, architectures and processors
Heuristic Loop-Based Scheduling and Allocation for DSP Synthesis with Heterogeneous Functional Units
Journal of VLSI Signal Processing Systems
An effective methodology for functional pipelining
ICCAD '92 Proceedings of the 1992 IEEE/ACM international conference on Computer-aided design
Power optimization using divide-and-conquer techniques for minimization of the number of operations
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Probabilistic Loop Scheduling for Applications with Uncertain Execution Time
IEEE Transactions on Computers
Throughput optimization of general non-linear computations
ICCAD '99 Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design
Properties and Algorithms for Unfolding of Probabilistic Data-Flow Graphs
Journal of VLSI Signal Processing Systems
Journal of VLSI Signal Processing Systems
Synthesis of low power folded programmable coefficient FIR digital filters (short paper)
ASP-DAC '00 Proceedings of the 2000 Asia and South Pacific Design Automation Conference
Minimizing Buffer Requirements under Rate-Optimal Schedule in Regular Dataflow Networks
Journal of VLSI Signal Processing Systems
Prediction of Performance and Processor Requirements in Real-Time Data Flow Architectures
IEEE Transactions on Parallel and Distributed Systems
Heuristic Algorithms for Scheduling Iterative Task Computations on Distributed Memory Machines
IEEE Transactions on Parallel and Distributed Systems
Scheduling Data-Flow Graphs via Retiming and Unfolding
IEEE Transactions on Parallel and Distributed Systems
Self-Timed Resynchronization: A Post-Optimization for Static Multiprocessor Schedules
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
CPR: Mixed Task and Data Parallel Scheduling for Distributed Systems
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Platform Independent Parallelising Tool Based on Graph Theoretic Models
VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
Low-power VLSI synthesis of DSP systems
Integration, the VLSI Journal
Assigning service requests in voice-over-internet gateway multiprocessors
Computers and Operations Research
Combining Extended Retiming and Unfolding for Rate-Optimal Graph Transformation
Journal of VLSI Signal Processing Systems
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Combining extended retiming and unfolding for rate-optimal graph transformation
Journal of VLSI Signal Processing Systems
On multiple-voltage high-level synthesis using algorithmic transformations
Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Verification method of dataflow algorithms in high-level synthesis
Journal of Systems and Software
Maximize Parallelism Minimize Overhead for Nested Loops via Loop Striping
Journal of VLSI Signal Processing Systems
High-level synthesis of DSP applications using adaptive negative cycle detection
EURASIP Journal on Applied Signal Processing
A new strategy for multiprocessor scheduling of cyclic task graphs
International Journal of High Performance Computing and Networking
Time-constrained loop scheduling with minimal resources
Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
Orchestrating the execution of stream programs on multicore platforms
Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation
A Low Complexity Reconfigurable DCT Architecture to Trade off Image Quality for Power Consumption
Journal of Signal Processing Systems
WSEAS Transactions on Signal Processing
Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
CellSs: Scheduling techniques to better exploit memory hierarchy
Scientific Programming - High Performance Computing with the Cell Broadband Engine
High performance architecture of an application specific processor for the H.264 deblocking filter
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
On compile-time evaluation of process partitioning transformations for Kahn process networks
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
High level synthesis of integrated heterogeneous pipelined processing elements for DSP applications
Computers and Electrical Engineering
Parallel image processing with the block data parallel architecture
IBM Journal of Research and Development
A fast spline curve rendering accelerator architecture
IEEE Transactions on Circuits and Systems II: Express Briefs
Efficient retiming and unfolding
ICASSP'93 Proceedings of the 1993 IEEE international conference on Acoustics, speech, and signal processing: plenary, special, audio, underwater acoustics, VLSI, neural networks - Volume I
Proceedings of the 16th Asia and South Pacific Design Automation Conference
Rate-optimal DSP synthesis by pipeline and minimum unfolding
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A constraint based approach to cyclic RCPSP
CP'11 Proceedings of the 17th international conference on Principles and practice of constraint programming
Loop striping: maximize parallelism for nested loops
EUC'06 Proceedings of the 2006 international conference on Embedded and Ubiquitous Computing
Stochastic DFS for multiprocessor scheduling of cyclic taskgraphs
PDCAT'04 Proceedings of the 5th international conference on Parallel and Distributed Computing: applications and Technologies
Global cyclic cumulative constraint
CPAIOR'12 Proceedings of the 9th international conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems
Throughput-constrained voltage and frequency scaling for real-time heterogeneous multiprocessors
Proceedings of the 28th Annual ACM Symposium on Applied Computing
ACM Transactions on Embedded Computing Systems (TECS) - Special Section ESFH'12, ESTIMedia'11 and Regular Papers
StreaMorph: a case for synthesizing energy-efficient adaptive programs using high-level abstractions
Proceedings of the Eleventh ACM International Conference on Embedded Software
CROSS cyclic resource-constrained scheduling solver
Artificial Intelligence
Hi-index | 14.99 |
Rate-optimal compile-time multiprocessor scheduling of iterative dataflow programs suitable for real-time signal processing applications is discussed. It is shown that recursions or loops in the programs lead to an inherent lower bound on the achievable iteration period, referred to as the iteration bound. A multiprocessor schedule is rate-optimal if the iteration period equals the iteration bound. Systematic unfolding of iterative dataflow programs is proposed, and properties of unfolded dataflow programs are studied. Unfolding increases the number of tasks in a program, unravels the hidden concurrently in iterative dataflow programs, and can reduce the iteration period. A special class of iterative dataflow programs, referred to as perfect-rate programs, is introduced. Each loop in these programs has a single register. Perfect-rate programs can always be scheduled rate optimally (requiring no retiming or unfolding transformation). It is also shown that unfolding any program by an optimum unfolding factor transforms any arbitrary program to an equivalent perfect-rate program, which can then be scheduled rate optimally. This optimum unfolding factor for any arbitrary program is the least common multiple of the number of registers (or delays) in all loops and is independent of the node execution times. An upper bound on the number of processors for rate-optimal scheduling is given.