An evaluation of directory schemes for cache coherence
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Visualizing Performance Debugging
Computer
Techniques for efficient inline tracing on a shared-memory multiprocessor
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Quartz: a tool for tuning parallel program performance
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
SPLASH: Stanford parallel applications for shared-memory
ACM SIGARCH Computer Architecture News
MemSpy: analyzing memory system bottlenecks in programs
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
The RP3 program visualization environment
IBM Journal of Research and Development
Analyzing and tuning memory performance in sequential and parallel programs
Analyzing and tuning memory performance in sequential and parallel programs
Portable Programs for Parallel Processors
Portable Programs for Parallel Processors
Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications
IEEE Transactions on Parallel and Distributed Systems
Analytical Prediction of Performance for Cache Coherence Protocols
IEEE Transactions on Computers
Performance Tuning Software DSM Applications using Visualisation
The Journal of Supercomputing
Experiences in modeling and simulation of computer architectures in DEVS
Transactions of the Society for Computer Simulation International - Recent advances in DEVS methodology--part II
A model for parallel simulation of distributed shared memory
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Detailed cache coherence characterization for OpenMP benchmarks
Proceedings of the 18th annual international conference on Supercomputing
Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks
IEEE Transactions on Parallel and Distributed Systems
DeFT: Design space exploration for on-the-fly detection of coherence misses
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Cache misses due to coherence actions are often the major source for performance degradation in cache coherent multiprocessors. It is often difficult for the programmer to take cache coherence into account when writing the program since the resulting access pattern is not apparent until the program is executed.SM-prof is a performance analysis tool that addresses this problem by visualising the shared data access pattern in a diagram with links to the source code lines causing performance degrading access patterns. The execution of a program is divided into time slots and each data block is classified based on the accesses made to the block during a time slot. This enables the programmer to follow the execution over time and it is possible to track the exact position responsible for accesses causing many cache misses related to coherence actions.Matrix multiplication and the MP3D application from SPLASH are used to illustrate the use of SM-prof. For MP3D, SM-prof revealed performance limitations that resulted in a performance improvement of over 75%.The current implementation is based on program-driven simulation in order to achieve non-intrusive profiling. If a small perturbation of the program execution is acceptable, it is also possible to use software tracing techniques given that a data address can be related to the originating instruction.