Monit: a performance monitoring tool for parallel and pseudo-parallel programs
SIGMETRICS '87 Proceedings of the 1987 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Logic verification algorithms and their parallel implementation
DAC '87 Proceedings of the 24th ACM/IEEE Design Automation Conference
An assessment of multilisp: lessons from experience
International Journal of Parallel Programming
Firefly: A Multiprocessor Workstation
IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
PRESTO: a system for object-oriented parallel programming
Software—Practice & Experience
Non-intrusive and interactive profiling in parasight
PPEALS '88 Proceedings of the ACM/SIGPLAN conference on Parallel programming: experience with applications, languages and systems
The fuzzy barrier: a mechanism for high speed synchronization of processors
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Performance-Measurement Tools in a Multiprocessor Environment
IEEE Transactions on Computers
PADD '88 Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging
The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors
IEEE Transactions on Computers
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
The performance of an object-oriented threads package
OOPSLA/ECOOP '90 Proceedings of the European conference on object-oriented programming on Object-oriented programming systems, languages, and applications
The integration of application and system based metrics in a parallel program performance tool
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Performance debugging shared memory multiprocessor programs with MTOOL
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
MemSpy: analyzing memory system bottlenecks in programs
SIGMETRICS '92/PERFORMANCE '92 Proceedings of the 1992 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Parallel program performance metrics: a comprison and validation
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Dynamic control of performance monitoring on large scale parallel systems
ICS '93 Proceedings of the 7th international conference on Supercomputing
Effectiveness of trace sampling for performance debugging tools
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A bibliography of parallel debuggers, 1993 edition
PADD '93 Proceedings of the 1993 ACM/ONR workshop on Parallel and distributed debugging
Performance assertion checking
SOSP '93 Proceedings of the fourteenth ACM symposium on Operating systems principles
Normalized performance indices for message passing parallel programs
ICS '94 Proceedings of the 8th international conference on Supercomputing
Multiapplication Support in a Parallel-Program Performance Tool
IEEE Parallel & Distributed Technology: Systems & Technology
An annotated bibliography of interactive program steering
ACM SIGPLAN Notices
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
PEDCAD: a framework for performance evaluation of object database applications
Proceedings of the 1995 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Management and utilization of knowledge for the automatic improvement of workflow performance
COCS '95 Proceedings of conference on Organizational computing systems
Waiting time analysis and performance visualization in Carnival
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
An online computation of critical path profiling
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Two performance tool design issues and CHITRA's solutions
SPDT '96 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Performance debugging shared memory parallel programs using run-time dependence analysis
SIGMETRICS '97 Proceedings of the 1997 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Continuous profiling: where have all the cycles gone?
ACM Transactions on Computer Systems (TOCS)
Continuous profiling: where have all the cycles gone?
Proceedings of the sixteenth ACM symposium on Operating systems principles
Performance measurements for multithreaded programs
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Searching for the sorting record: experiences in tuning NOW-Sort
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Supporting Scalable Performance Monitoring and Analysis of Parallel Programs
The Journal of Supercomputing
Proceedings of the 14th international conference on Supercomputing
A Tool to Help Tune where Computation Is Performed
IEEE Transactions on Software Engineering
Dynamic statistical profiling of communication activity in distributed applications
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Parallel performance prediction using lost cycles analysis
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Medea: A Tool for Workload Characterization of Parallel Systems
IEEE Parallel & Distributed Technology: Systems & Technology
Performance Analysis and Tuning for a Single-Chip Multiprocessor DSP
IEEE Parallel & Distributed Technology: Systems & Technology
Trapper: Eliminating Performance Bottlenecks in a Parallel Embedded Application
IEEE Parallel & Distributed Technology: Systems & Technology
Mtool: An Integrated System for Performance Debugging Shared Memory Multiprocessor Applications
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Software Engineering
Deep Start: A Hybrid Strategy for Automated Performance Problem Searches
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
µProfiler: Profiling User-Level Threads in a Shared-Memory Programming Environment
ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
A Performance Debugger for Eliminating Excess Synchronization in Shared-Memory Parallel Programs
MASCOTS '96 Proceedings of the 4th International Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
Software—Practice & Experience
Scalability analysis of SPMD codes using expectations
Proceedings of the 21st annual international conference on Supercomputing
Effective performance measurement and analysis of multithreaded applications
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Analyzing lock contention in multithreaded applications
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Locating cache performance bottlenecks using data profiling
Proceedings of the 5th European conference on Computer systems
Measurement techniques for multiagent systems
PerMIS '08 Proceedings of the 8th Workshop on Performance Metrics for Intelligent Systems
The Cilkview scalability analyzer
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Kremlin: rethinking and rebooting gprof for the multicore age
Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation
Kismet: parallel speedup estimates for serial programs
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Harmony: collection and analysis of parallel block vectors
Proceedings of the 39th Annual International Symposium on Computer Architecture
Performance analysis of SCOOP programs
Journal of Systems and Software
Hi-index | 0.00 |
Initial implementations of parallel programs typically yield disappointing performance. Tuning to improve performance is thus a significant part of the parallel programming process. The effort required to tune a parallel program, and the level of performance that eventually is achieved, both depend heavily on the quality of the instrumentation that is available to the programmer.This paper describes Quartz, a new tool for tuning parallel program performance on shared memory multiprocessors. The philosophy underlying Quartz was inspired by that of the sequential UNIX tool gprof: to appropriately direct the attention of the programmer by efficiently measuring just those factors that are most responsible for performance and by relating these metrics to one another and to the structure of the program. This philosophy is even more important in the parallel domain than in the sequential domain, because of the dramatically greater number of possible metrics and the dramatically increased complexity of program structures.The principal metric of Quartz is normalized processor time: the total processor time spent in each section of code divided by the number of other processors that are concurrently busy when that section of code is being executed. Tied to the logical structure of the program, this metric provides a “smoking gun” pointing towards those areas of the program most responsible for poor performance. This information can be acquired efficiently by checkpointing to memory the number of busy processors and the state of each processor, and then statistically sampling these using a dedicated processor.In addition to describing the design rationale, functionality, and implementation of Quartz, the paper examines how Quartz would be used to solve a number of performance problems that have been reported as being frequently encountered, and describes a case study in which Quartz was used to significantly improve the performance of a CAD circuit verifier.