Compilation of parallel multimedia computations—extending retiming theory and Amdahl's law

  • Authors:
  • G. Srinivasa N. Prasanna

  • Affiliations:
  • 7D-311, Lucent Technologies, 600 Mountain Ave., PO Box 636, Murray Hill, NJ

  • Venue:
  • PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multimedia applications (also called multimedia systems) operate on datastreams, which are periodic sequences of data elements, called datasets. A large class of multimedia applications is described by the macro-dataflow graph model, with nodes representing parallelizable tasks, and arcs representing communication. This paper examines how such multimedia applications can be compiled to run efficiently on parallel machines, by optimizing both throughput (T) and latency (L), using two techniques, based on task speedup functions. The first step chooses an appropriate pipeline structure for the system (task clustering). The second step exploits the dataset parallelism intrinsic in the periodic datastream, and runs multiple datasets in parallel (task/cluster multiplicity) for each clustering. The key find-of this research areA The best task clustering depends on system throughput. In general skewed parallelism profiles are desirable i.e. tasks with good speedup and tasks with poor speedup are in separate clusters. Indeed the maximal throughput and minimal latency can be simultaneously attained in the limiting case of a maximally skewed distribution. This result can be viewed as a generalization of Amdahl's law for real-time applications.B Optimal dataset multiplicity for a specific clustering can be determined by extending retiming theory [1] to include parallel resource allocation. In this process, counter-intuitive relaxation regions often appear, wherein by increasing dataset multiplicity, throughput is increased and latency simultaneously reduced (a free lunch).The techniques have been used for compiling real-time image-processing problems on an NCUBE-2 multiprocessor, and show substantial performance gains.