Physical simulation for animation and visual effects: parallelization and characterization for chip multiprocessors

  • Authors:
  • Christopher J. Hughes;Radek Grzeszczuk;Eftychios Sifakis;Daehyun Kim;Sanjeev Kumar;Andrew P. Selle;Jatin Chhugani;Matthew Holliman;Yen-Kuang Chen

  • Affiliations:
  • Intel, Santa Clara, CA;Nokia Labs, Palo Alto, CA;Stanford University, Stanford, CA;Intel, Santa Clara, CA;Intel, Santa Clara, CA;Stanford University, Stanford, CA;Intel, Santa Clara, CA;Intel, Santa Clara, CA;Intel, Santa Clara, CA

  • Venue:
  • Proceedings of the 34th annual international symposium on Computer architecture
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We explore the emerging application area of physics-based simulation for computer animation and visual special effects. In particular, we examine its parallelization potential and characterize its behavior on a chip multiprocessor (CMP). Applications in this domain model and simulate natural phenomena, and often direct visual components of motion pictures. We study a set of three workloads that exemplify the span and complexity of physical simulation applications used in a production environment: fluid dynamics, facial animation, and cloth simulation. They are computationally demanding, requiring from a few seconds to several minutes to simulate a single frame; therefore, they can benefit greatly from the acceleration possible with large scale CMPs. Starting with serial versions of these applications, we parallelize code accounting for at least 96% of the serial execution time, targeting a large number of threads.We then study the most expensive modules using a simulated 64-core CMP. For the code representing key modules, we achieve parallel scaling of 45x, 50x, and 30x for fluid, face, and cloth simulations, respectively. The modules have a spectrum of parallel task granularity and locking behavior, and all but one are dominated by loop-level parallelism. Many modules operate on streams of data. In some cases, modules iterate over their data, leading to significant temporal locality. This streaming behavior leads to very high on-die and main memory bandwidth requirements. Finally, most modules have little inter-thread communication since they are data-parallel, but a few require heavy communication between data-parallel operations.