The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Proceedings of the 26th annual conference on Computer graphics and interactive techniques
Robust treatment of collisions, contact and friction for cloth animation
Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Animation and rendering of complex water surfaces
Proceedings of the 29th annual conference on Computer graphics and interactive techniques
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
Block Red-Black Ordering Method for Parallel Processing of ICCG Solver
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Simulation of clothing with folds and wrinkles
Proceedings of the 2003 ACM SIGGRAPH/Eurographics symposium on Computer animation
Automatic determination of facial muscle activations from sparse motion capture marker data
ACM SIGGRAPH 2005 Papers
Robust quasistatic finite elements and flesh simulation
Proceedings of the 2005 ACM SIGGRAPH/Eurographics symposium on Computer animation
Simulating speech with a physics-based facial muscle model
Proceedings of the 2006 ACM SIGGRAPH/Eurographics symposium on Computer animation
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
NOCS '08 Proceedings of the Second ACM/IEEE International Symposium on Networks-on-Chip
ACM SIGARCH Computer Architecture News
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Toward a multicore architecture for real-time ray-tracing
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Real-time deformation and fracture in a game environment
Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation
Router microarchitecture and scalability of ring topology in on-chip networks
Proceedings of the 2nd International Workshop on Network on Chip Architectures
Low-cost router microarchitecture for on-chip networks
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
An efficient multigrid method for the simulation of high-resolution elastic solids
ACM Transactions on Graphics (TOG)
Cohesion: a hybrid memory model for accelerators
Proceedings of the 37th annual international symposium on Computer architecture
A parallel multigrid Poisson solver for fluids simulation on large grids
Proceedings of the 2010 ACM SIGGRAPH/Eurographics Symposium on Computer Animation
Virtual try on: an application in need of GPU optimization
Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?
Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Simulation of deformable solids in interactive virtual reality applications
Proceedings of the 18th ACM symposium on Virtual reality software and technology
Hi-index | 0.00 |
We explore the emerging application area of physics-based simulation for computer animation and visual special effects. In particular, we examine its parallelization potential and characterize its behavior on a chip multiprocessor (CMP). Applications in this domain model and simulate natural phenomena, and often direct visual components of motion pictures. We study a set of three workloads that exemplify the span and complexity of physical simulation applications used in a production environment: fluid dynamics, facial animation, and cloth simulation. They are computationally demanding, requiring from a few seconds to several minutes to simulate a single frame; therefore, they can benefit greatly from the acceleration possible with large scale CMPs. Starting with serial versions of these applications, we parallelize code accounting for at least 96% of the serial execution time, targeting a large number of threads.We then study the most expensive modules using a simulated 64-core CMP. For the code representing key modules, we achieve parallel scaling of 45x, 50x, and 30x for fluid, face, and cloth simulations, respectively. The modules have a spectrum of parallel task granularity and locking behavior, and all but one are dominated by loop-level parallelism. Many modules operate on streams of data. In some cases, modules iterate over their data, leading to significant temporal locality. This streaming behavior leads to very high on-die and main memory bandwidth requirements. Finally, most modules have little inter-thread communication since they are data-parallel, but a few require heavy communication between data-parallel operations.