Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
The fuzzy barrier: a mechanism for high speed synchronization of processors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Visualizing Performance Debugging
Computer
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Quartz: a tool for tuning parallel program performance
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Visualizing the behavior of massively parallel programs
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Breaking the silence: auralization of parallel program behavior
Journal of Parallel and Distributed Computing - Special issue on tools and methods for visualization of parallel systems and computations
Simulation of multiprocessors: accuracy and performance
Simulation of multiprocessors: accuracy and performance
The detection and elimination of useless misses in multiprocessors
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Software versus hardware shared-memory implementation: a case study
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Memory consistency and event ordering in scalable shared-memory multiprocessors
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications
IEEE Transactions on Parallel and Distributed Systems
SPLASH: Stanford parallel applications for shared-memory
SPLASH: Stanford parallel applications for shared-memory
TreadMarks: distributed shared memory on standard workstations and operating systems
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Performance debugging shared memory parallel programs using run-time dependence analysis
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Eliminating Barrier Synchronization for Compiler-Parallelized Codes on Software DSMs
International Journal of Parallel Programming
Compile-time Synchronization Optimizations for Software DSMs
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Hi-index | 0.00 |
We describe a new approach to performance debugging that focuses on the automatic detection of unnecessary or excessive synchronization. We have implemented a prototype performance debugger that implements this approach, reporting the excess synchronization back to the user at the source level. We describe our performance debugger and report some results from applying it to a suite of programs. In particular, we report the type and amount of excess synchronization found in each program and the effect that eliminating this synchronization had on the program's performance. In one case, eliminating the excess synchronization identified by the performance debugger reduced the execution time by 41%.