Towards an architecture-independent analysis of parallel algorithms
SIAM Journal on Computing
Scheduling pipelined communication in distributed memory multiprocessors for real-time applications
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Optimal latency-throughput tradeoffs for data parallel pipelines
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
On Exploiting Task Duplication in Parallel Program Scheduling
IEEE Transactions on Parallel and Distributed Systems
Precedence-Constrained Task Allocation onto Point-to-Point Networks for Pipelined Execution
IEEE Transactions on Parallel and Distributed Systems
Static scheduling algorithms for allocating directed task graphs to multiprocessors
ACM Computing Surveys (CSUR)
Grain Size Determination for Parallel Processing
IEEE Software
Optimal Processor Assignment for a Class of Pipelined Computations
IEEE Transactions on Parallel and Distributed Systems
A Pipeline-Based Approach for Scheduling Video Processing Algorithms on NOW
IEEE Transactions on Parallel and Distributed Systems
Executing multiple pipelined data analysis operations in the grid
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Design, Implementation and Evaluation of Parallel Pipelined STAP on Parallel Computers
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Resource allocation in a middleware for streaming data
MGC '04 Proceedings of the 2nd workshop on Middleware for grid computing
A static resource allocation framework for Grid-based streaming applications: Research Articles
Concurrency and Computation: Practice & Experience - Middleware for Grid Computing
Integrated scratchpad memory optimization and task scheduling for MPSoC architectures
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Large image correction and warping in a cluster environment
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
ICASSP '96 Proceedings of the Acoustics, Speech, and Signal Processing, 1996. on Conference Proceedings., 1996 IEEE International Conference - Volume 06
Mapping pipeline skeletons onto heterogeneous platforms
Journal of Parallel and Distributed Computing
Bi-criteria Pipeline Mappings for Parallel Image Processing
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part I
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Complexity results for throughput and latency optimization of replicated and data-parallel workflows
CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
Multi-criteria scheduling of pipeline workflows
CLUSTER '07 Proceedings of the 2007 IEEE International Conference on Cluster Computing
Compaction of Schedules and a Two-Stage Approach for Duplication-Based DAG Scheduling
IEEE Transactions on Parallel and Distributed Systems
A task duplication based bottom-up scheduling algorithm for heterogeneous environments
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Toward optimizing latency under throughput constraints for application workflows on clusters
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Enhancing throughput for streaming applications running on cluster systems
Journal of Parallel and Distributed Computing
A survey of pipelined workflow scheduling: Models and algorithms
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Scheduling, in many application domains, involves optimization of multiple performance metrics. For example, application workflows with real-time constraints have strict throughput requirements and also desire a low latency or response time. In this paper, we present a novel algorithm for the scheduling of workflows that act on a stream of input data. Our algorithm focuses on the two performance metrics, latency and throughput, and minimizes the latency of workflows while satisfying strict throughput requirements. We also describe steps to use the above approach to solve the problem of meeting latency requirements while maximizing throughput. We leverage pipelined, task and data parallelism in a coordinated manner to meet these objectives and investigate the benefit of task duplication in alleviating communication overheads in the pipelined schedule for different workflow characteristics. The proposed algorithm is designed for a realistic bounded multi-port communication model, where each processor can simultaneously communicate with at most k distinct processors. Experimental evaluation using synthetic benchmarks as well as those derived from real applications shows that our algorithm consistently produces lower latency schedules that meet throughput requirements, even when previously proposed schemes fail.