Portable programs for parallel processors
Portable programs for parallel processors
Non-intrusive and interactive profiling in parasight
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Characterizing the synchronization behavior of parallel programs
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
Performance-Measurement Tools in a Multiprocessor Environment
IEEE Transactions on Computers
Determining average program execution times and their variance
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Journal of Parallel and Distributed Computing - Special issue: software tools for parallel programming and visualization
Quartz: a tool for tuning parallel program performance
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Analyzing and visualizing performance of memory hierarchies
Parallel computer systems
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Tolerating latency through software-controlled prefetching in shared-memory multiprocessors
Journal of Parallel and Distributed Computing - Special issue on shared-memory multiprocessors
Instrumentation for a massively parallel MIMD application
Journal of Parallel and Distributed Computing
Profile-driven compilation
Multiprocessor performance debugging and memory bottlenecks
Multiprocessor performance debugging and memory bottlenecks
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
IPS-2: The Second Generation of a Parallel Program Measurement System
IEEE Transactions on Parallel and Distributed Systems
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
Performance observability
Volume rendering on scalable shared-memory MIMD architectures
VVS '92 Proceedings of the 1992 workshop on Volume visualization
Analytical performance prediction on multicomputers
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Performance debugging using parallel performance predicates
PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
Optimally profiling and tracing programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Architectural support for performance tuning: a case study on the SPARCcenter 2000
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Informing memory operations: providing memory performance feedback in modern processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Integrating performance monitoring and communication in parallel computers
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Continuous profiling: where have all the cycles gone?
ACM Transactions on Computer Systems (TOCS)
Continuous profiling: where have all the cycles gone?
Proceedings of the sixteenth ACM symposium on Operating systems principles
Informing memory operations: memory performance feedback mechanisms and their applications
ACM Transactions on Computer Systems (TOCS)
Cache conscious programming in undergraduate computer science
SIGCSE '99 The proceedings of the thirtieth SIGCSE technical symposium on Computer science education
Efficient performance prediction for modern microprocessors
Proceedings of the 2000 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The Journal of Supercomputing
Using hardware performance monitors to isolate memory bottlenecks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
A time-stamping algorithm for efficient performance estimation of superscalar processors
Proceedings of the 2001 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Parallel performance prediction using lost cycles analysis
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
IEEE Transactions on Software Engineering
MASCOTS '95 Proceedings of the 3rd International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems
SIGMA: a simulator infrastructure to guide memory analysis
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Performance visualization for distributed shared memory systems
Virtual shared memory for distributed architectures
A Performance Debugger for Eliminating Excess Synchronization in Shared-Memory Parallel Programs
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Data Centric Cache Measurement on the Intel ltanium 2 Processor
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Intermediately executed code is the key to find refactorings that improve temporal data locality
Proceedings of the 3rd conference on Computing frontiers
I/O system performance debugging using model-driven anomaly characterization
FAST'05 Proceedings of the 4th conference on USENIX Conference on File and Storage Technologies - Volume 4
Transaction-Based Communication-Centric Debug
NOCS '07 Proceedings of the First International Symposium on Networks-on-Chip
Memory behavior of an X11 window system
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
RDVIS: a tool that visualizes the causes of low locality and hints program optimizations
ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part II
Hi-index | 0.00 |
The authors describe Mtool, a software tool for analyzing performance losses in shared memory parallel programs. Mtool augments a program with low overhead instrumentation which perturbs the program's execution as little as possible while generating enough information to isolate memory and synchronization bottlenecks. After running the instrumented version of the parallel program, the programmer can use Mtool's window-based user interface to view compute time, memory, and synchronization objects. The authors describe Mtool's low overhead instrumentation methods, memory bottleneck detection technique, and attention focusing mechanisms, contrast Mtool with other approaches, and offer a case study to demonstrate its effectiveness.