Portable programs for parallel processors
Portable programs for parallel processors
Parallel discrete event simulation
Communications of the ACM - Special issue on simulation
Paradigm: A Highly Scalable Shared-Memory Multicomputer Architecture
Computer - Special issue on cryptography
The Stanford Dash Multiprocessor
Computer
LocusRoute: a parallel global router for standard cells
DAC '88 Proceedings of the 25th ACM/IEEE Design Automation Conference
Empirical studies of competitve spinning for a shared-memory multiprocessor
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Asynchronous distributed simulation via a sequence of parallel computations
Communications of the ACM - Special issue on simulation modeling and statistical computing
Reducing synchronization overhead in parallel simulation
PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
ACM SIGARCH Computer Architecture News
Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Synchronization is a significant cost in many parallel programs, and can be a major bottleneck if it is handled in a centralized fashion using traditional shared-memory constructs such as barriers. In a parallel time-stepped simulation, the use of global synchronization primitives limits scalability, increases the sensitivity to load imbalance, and reduces the potential for exploiting locality to improve cache behavior.This paper presents the results of an initial one-application study quantifying the costs and performance benefits of distributed, nearest neighbors synchronization. The application studied, MP3D, is a particle-based wind tunnel simulation. Our results for this one application on current shared-memory multiprocessors show a significant decrease in synchronization time using these techniques. We prototyped an application-independent library that implements distributed synchronization. The library allows a variety of parallel simulations to exploit these techniques without increasing the application programming beyond that of conventional approaches.