Implementation and performance of Munin
SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
MemSpy: analyzing memory system bottlenecks in programs
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Adaptive cache coherency for detecting migratory shared data
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
An adaptive cache coherence protocol optimized for migratory sharing
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Dynamic control of performance monitoring on large scale parallel systems
ICS '93 Proceedings of the 7th international conference on Supercomputing
Combined performance gains of simple cache protocol extensions
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The Stanford FLASH multiprocessor
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tempest and typhoon: user-level shared memory
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Fine-grain access control for distributed shared memory
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
CRL: high-performance all-software distributed shared memory
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
StormWatch: a tool for visualizing memory system protocols
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Teapot: language support for writing memory coherence protocols
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Informing memory operations: providing memory performance feedback in modern processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
STiNG: a CC-NUMA computer system for the commercial marketplace
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Integrating performance monitoring and communication in parallel computers
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Shasta: a low overhead, software-only approach for supporting fine-grain shared memory
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Mapping performance data for high-level and data views of parallel program performance
ICS '96 Proceedings of the 10th international conference on Supercomputing
Optimizing communication in HPF programs on fine-grain distributed shared memory
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Relaxed consistency and coherence granularity in DSM systems: a performance evaluation
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Application-specific protocols for user-level shared memory
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Tempest: a substrate for portable parallel programs
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
MDL: A Language And Compiler For Dynamic Program Instrumentation
PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Hardware Support for Flexible Distributed Shared Memory
IEEE Transactions on Computers
Hierarchical fuzzy configuration of implementation strategies
Proceedings of the 1999 ACM symposium on Applied computing
Experiment management support for performance tuning
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Performance Tuning Software DSM Applications using Visualisation
The Journal of Supercomputing
On-Line Debugging and Performance Monitoring with Barriers
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
TAPE: a transactional application profiling environment
Proceedings of the 19th annual international conference on Supercomputing
A multi-level monitoring framework for stream-based coordination programs
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Hi-index | 0.00 |
This paper describes a new approach to finding performance bottlenecks in shared-memory parallel programs and its embodiment in the Paradyn Parallel Performance Tools running with the Blizzard fine-grain distributed shared memory system. This approach exploits the underlying system's cache coherence protocol to detect data sharing patterns that indicate potential performance bottlenecks and presents performance measurements in a data-centric manner. As a demonstration, Parodyn helped us improve the performance of a new shared-memory application program by a factor of four.