A mixed-integer linear programming problem which is efficiently solvable
Journal of Algorithms
A fast static scheduling algorithm for DAGs on an unbounded number of processors
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Optimal mapping of sequences of data parallel tasks
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimal latency-throughput tradeoffs for data parallel pipelines
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Generalized multiprocessor scheduling for directed acylic graphs
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Communication and memory requirements as the basis for mapping task and data parallel programs
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Optimal Processor Assignment for a Class of Pipelined Computations
IEEE Transactions on Parallel and Distributed Systems
Scheduling constrained dynamic applications on clusters
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Hi-index | 0.00 |
Multimedia applications (also called multimedia systems) operate on datastreams, which are periodic sequences of data elements, called datasets. A large class of multimedia applications is described by the macro-dataflow graph model, with nodes representing parallelizable tasks, and arcs representing communication. This paper examines how such multimedia applications can be compiled to run efficiently on parallel machines, by optimizing both throughput (T) and latency (L), using two techniques, based on task speedup functions. The first step chooses an appropriate pipeline structure for the system (task clustering). The second step exploits the dataset parallelism intrinsic in the periodic datastream, and runs multiple datasets in parallel (task/cluster multiplicity) for each clustering. The key find-of this research areA The best task clustering depends on system throughput. In general skewed parallelism profiles are desirable i.e. tasks with good speedup and tasks with poor speedup are in separate clusters. Indeed the maximal throughput and minimal latency can be simultaneously attained in the limiting case of a maximally skewed distribution. This result can be viewed as a generalization of Amdahl's law for real-time applications.B Optimal dataset multiplicity for a specific clustering can be determined by extending retiming theory [1] to include parallel resource allocation. In this process, counter-intuitive relaxation regions often appear, wherein by increasing dataset multiplicity, throughput is increased and latency simultaneously reduced (a free lunch).The techniques have been used for compiling real-time image-processing problems on an NCUBE-2 multiprocessor, and show substantial performance gains.