Fast breakpoints: design and implementation
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Perturbation analysis of high level instrumentation for SPMD programs
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Portable profiling and tracing for parallel, scientific applications using C++
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Performance technology for complex parallel and distributed systems
Distributed and parallel systems
A tool framework for static and dynamic analysis of object-oriented software with templates
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
From trace generation to visualization: a performance framework for distributed parallel systems
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Component Software: Beyond Object-Oriented Programming
Component Software: Beyond Object-Oriented Programming
HPCVIEW: A Tool for Top-down Analysis of Node Performance
The Journal of Supercomputing
Design and Prototype of a Performance Tool Interface for OpenMP
The Journal of Supercomputing
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
A Performance Interface for Component-Based Applications
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
SvPablo: A Multi-Language Architecture-Independent Performance Analysis System
ICPP '99 Proceedings of the 1999 International Conference on Parallel Processing
Performance observability
The role of instrumentation and mapping in performance measurement
The role of instrumentation and mapping in performance measurement
Advances in the TAU performance system
Performance analysis and grid computing
Java virtual machine profiler interface
IBM Systems Journal
An Algebra for Cross-Experiment Performance Analysis
ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
A Portable Programming Interface for Performance Evaluation on Modern Processors
International Journal of High Performance Computing Applications
Design and Implementation of a Parallel Performance Data Management Framework
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
An API for Runtime Code Patching
International Journal of High Performance Computing Applications
A Component Architecture for High-Performance Scientific Computing
International Journal of High Performance Computing Applications
Performance instrumentation and measurement for terascale systems
ICCS'03 Proceedings of the 2003 international conference on Computational science
PerfExplorer: A Performance Data Mining Framework For Large-Scale Parallel Computing
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Journal of Systems and Software
A study of tracing overhead on a high-performance linux cluster
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Compensation of Measurement Overhead in Parallel Performance Profiling
International Journal of High Performance Computing Applications
Tackling component interoperability in quantum chemistry software
Proceedings of the 2007 symposium on Component and framework technology in high-performance and scientific computing
Transparent grid enablement of weather research and forecasting
Proceedings of the 15th ACM Mardi Gras conference: From lightweight mash-ups to lambda grids: Understanding the spectrum of distributed computing requirements, applications, tools, infrastructures, interoperability, and the incremental adoption of key capabilities
A productivity centered application performance tuning framework
Proceedings of the 2nd international conference on Performance evaluation methodologies and tools
CAMP: a common API for measuring performance
LISA'07 Proceedings of the 21st conference on Large Installation System Administration Conference
Characterizing the I/O behavior of scientific applications on the Cray XT
PDSW '07 Proceedings of the 2nd international workshop on Petascale data storage: held in conjunction with Supercomputing '07
DARC: dynamic analysis of root causes of latency distributions
SIGMETRICS '08 Proceedings of the 2008 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Constructing a performance database for large-scale quantum chemistry packages
Proceedings of the 2008 Spring simulation multiconference
Slogger: A profiling and analysis system based on Semantic Web technologies
Scientific Programming - Large-Scale Programming Tools and Environments
Knowledge support and automation for performance analysis with PerfExplorer 2.0
Scientific Programming - Large-Scale Programming Tools and Environments
Scientific Programming - Large-Scale Programming Tools and Environments
Streamsight: a visualization tool for large-scale streaming applications
Proceedings of the 4th ACM symposium on Software visualization
Scalable load-balance measurement for SPMD codes
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Capturing performance knowledge for automated analysis
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
ICCS '08 Proceedings of the 8th international conference on Computational Science, Part III
Observing Performance Dynamics Using Parallel Profile Snapshots
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
A component infrastructure for performance and power modeling of parallel scientific applications
Proceedings of the 2008 compFrame/HPC-GECO workshop on Component based high performance
Parametric Studies in Eclipse with TAU and PerfExplorer
Euro-Par 2008 Workshops - Parallel Processing
Binary analysis for measurement and attribution of program performance
Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation
Supporting nested OpenMP parallelism in the TAU performance system
International Journal of Parallel Programming
A Generic and Configurable Source-Code Instrumentation Component
ICCS 2009 Proceedings of the 9th International Conference on Computational Science
Adaptive Application Composition in Quantum Chemistry
QoSA '09 Proceedings of the 5th International Conference on the Quality of Software Architectures: Architectures for Adaptive Software Systems
Comprehensive cache performance tuning with a toolset
Future Generation Computer Systems
Scalable Detection of MPI-2 Remote Memory Access Inefficiency Patterns
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Assigning Blame: Mapping Performance to High Level Parallel Programming Abstractions
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
PSINS: An Open Source Event Tracer and Execution Simulator for MPI Applications
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
FACT: fast communication trace collection for parallel applications through program slicing
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Diagnosing performance bottlenecks in emerging petascale applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Tools and strategies for debugging distributed stream processing applications
Software—Practice & Experience
Performance Analysis Framework for High-Level Language Applications in Reconfigurable Computing
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
OnRamp: enabling a new component-based development paradigm
Proceedings of the 2009 Workshop on Component-Based High Performance Computing
A case study on dynamic kernel adaptation in a component-based infectious disease simulator
Proceedings of the 2009 Workshop on Component-Based High Performance Computing
An automated component-based performance experiment environment
Proceedings of the 2009 Workshop on Component-Based High Performance Computing
Scalable I/O tracing and analysis
Proceedings of the 4th Annual Workshop on Petascale Data Storage
Monitoring and steering Grid applications with GRID superscalar
Future Generation Computer Systems
Visual and algorithmic tooling for system trace analysis: a case study
ACM SIGOPS Operating Systems Review
Monitoring MPI programs for performance characterization and management control
Proceedings of the 2010 ACM Symposium on Applied Computing
Visualizing large-scale streaming applications
Information Visualization
High-level user interfaces for the DOE ACTS collection
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Workload characterization using the TAU performance system
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Automatic tuning in computational grids
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Optimization of instrumentation in parallel performance evaluation tools
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
GASP! a standardized performance analysis tool interface for global address space programming models
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Proceedings of the 24th ACM International Conference on Supercomputing
Workload characterization for operator-based distributed stream processing applications
Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
Identifying software usage at HPC centers with the automatic library tracking database
Proceedings of the 2010 TeraGrid Conference
Automatic Phase Detection and Structure Extraction of MPI Applications
International Journal of High Performance Computing Applications
Scalable Communication Trace Compression
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Cell broadband engine processor performance optimization: tracing tools implementation and use
IBM Journal of Research and Development
Understanding complex multithreaded software systems by using trace visualization
Proceedings of the 5th international symposium on Software visualization
Zinsight: a visual and analytic environment for exploring large event traces
Proceedings of the 5th international symposium on Software visualization
Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
PerfExpert: An Easy-to-Use Performance Diagnosis Tool for HPC Applications
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Detailed performance analysis using coarse grain sampling
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Automatic performance analysis of large scale simulations
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
International Journal of High Performance Computing Applications
Supporting nested OpenMP parallelism in the TAU performance system
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
Behavior-based problem localization for parallel file systems
HotDep'10 Proceedings of the Sixth international conference on Hot topics in system dependability
THOR: a performance analysis tool for java applications running on multicore systems
IBM Journal of Research and Development
Perfctr-Xen: a framework for performance counter virtualization
Proceedings of the 7th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Instrumentation-based tool for latency measurements
Proceedings of the 2nd ACM/SPEC International Conference on Performance engineering
Evaluation of message passing communication patterns in finite element solution of coupled problems
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Automatic performance debugging of SPMD-style parallel programs
Journal of Parallel and Distributed Computing
Automatic generation of executable communication specifications from parallel applications
Proceedings of the international conference on Supercomputing
Scalable fine-grained call path tracing
Proceedings of the international conference on Supercomputing
Modeling the performance of an algebraic multigrid cycle on HPC platforms
Proceedings of the international conference on Supercomputing
An idiom-finding tool for increasing productivity of accelerators
Proceedings of the international conference on Supercomputing
Visual analysis of I/O system behavior for high-end computing
Proceedings of the third international workshop on Large-scale system and application performance
Multi-scale analysis of large distributed computing systems
Proceedings of the third international workshop on Large-scale system and application performance
Efficient, sensitivity resistant binary instrumentation
Proceedings of the 2011 International Symposium on Software Testing and Analysis
Auto-tuning full applications: A case study
International Journal of High Performance Computing Applications
A dynamic optimization framework for OpenMP
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
Anywhere, any-time binary instrumentation
Proceedings of the 10th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools
Reducing the overhead of direct application instrumentation using prior static analysis
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Application-specific fault tolerance via data access characterization
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Characterizing applications from power consumption: a case study for HPC benchmarks
ICT-GLOW'11 Proceedings of the First international conference on Information and communication on technology for the fight against global warming
Data centric techniques for mapping performance data to program variables
Parallel Computing
TAUg: runtime global performance data access using MPI
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Tracing the MPI-IO calls' disk accesses
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Integrating TAU with eclipse: a performance analysis system in an integrated development environment
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
A balanced approach to application performance tuning
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Ray tracing visualization toolkit
I3D '12 Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games
How to reconcile event-based performance analysis with tasking in OpenMP
IWOMP'10 Proceedings of the 6th international conference on Beyond Loop Level Parallelism in OpenMP: accelerators, Tasking and more
Trace profiling: Scalable event tracing on high-end parallel systems
Parallel Computing
Compiler-Directed performance model construction for parallel programs
ARCS'10 Proceedings of the 23rd international conference on Architecture of Computing Systems
Highly integrated environment for parallel application development using qoscosgrid middleware
Building a National Distributed e-Infrastructure - PL-Grid
A lightweight library for building scalable tools
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Improving the scalability of performance evaluation tools
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Automatic performance analysis of OpenMP codes on a scalable shared memory system using periscope
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Further improving the scalability of the scalasca toolset
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Enhancing brainware productivity through a performance tuning workflow
Euro-Par'11 Proceedings of the 2011 international conference on Parallel Processing - Volume 2
ADP: automated diagnosis of performance pathologies using hardware events
Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
Virtualization of reconfigurable coprocessors in HPRC systems with multicore architecture
Journal of Systems Architecture: the EUROMICRO Journal
Performance characterization of global address space applications: a case study with NWChem
Concurrency and Computation: Practice & Experience
Boosting Application-Specific Parallel I/O Optimization Using IOSIG
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Scalable detection of MPI-2 remote memory access inefficiency patterns
International Journal of High Performance Computing Applications
Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment: Bridging from the eXtreme to the campus and beyond
Harmony: collection and analysis of parallel block vectors
Proceedings of the 39th Annual International Symposium on Computer Architecture
Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems
SIAM Journal on Scientific Computing
OpenMP parallelism for fluid and fluid-particulate systems
Parallel Computing
Performance analysis techniques for task-based OpenMP applications
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Review: Energy-aware performance analysis methodologies for HPC architectures-An exploratory study
Journal of Network and Computer Applications
Concurrency and Computation: Practice & Experience
On using incremental profiling for the performance analysis of shared memory parallel applications
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
TA UoverSupermon: low-overhead online parallel performance monitoring
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Analysis of the MPI-IO optimization levels with the PIOViz Jumpshot enhancement
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Towards an energy-aware scientific I/O interface
Computer Science - Research and Development
Towards scalable event tracing for high end systems
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
A multi-level monitoring framework for stream-based coordination programs
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Exact dependence analysis for increased communication overlap
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Interactive visualization for memory reference traces
EuroVis'08 Proceedings of the 10th Joint Eurographics / IEEE - VGTC conference on Visualization
Traces generation to simulate large-scale distributed applications
Proceedings of the Winter Simulation Conference
Runtime function instrumentation with EZTrace
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
On the instrumentation of OpenMP and ompss tasking constructs
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Strategies for real-time event reduction
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Elastic and scalable tracing and accurate replay of non-deterministic events
Proceedings of the 27th international ACM conference on International conference on supercomputing
A new approach for performance analysis of openMP programs
Proceedings of the 27th international ACM conference on International conference on supercomputing
Quantifying performance bottleneck cost through differential analysis
Proceedings of the 27th international ACM conference on International conference on supercomputing
Towards I/O analysis of HPC systems and a generic architecture to collect access patterns
Computer Science - Research and Development
Simulating parallel programs on application and system level
Computer Science - Research and Development
Improving performance of openSHMEM reference library by portable PE mapping technique
Proceedings of the 27th international ACM conference on International conference on supercomputing
Inspector/executor load balancing algorithms for block-sparse tensor contractions
Proceedings of the 27th international ACM conference on International conference on supercomputing
An early prototype of an autonomic performance environment for exascale
Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
Comprehensive job level resource usage measurement and analysis for XSEDE HPC systems
Proceedings of the Conference on Extreme Science and Engineering Discovery Environment: Gateway to Discovery
Runtime message uniquification for accurate communication analysis on incomplete MPI event traces
Proceedings of the 20th European MPI Users' Group Meeting
Understanding the formation of wait states in applications with one-sided communication
Proceedings of the 20th European MPI Users' Group Meeting
Enabling comprehensive data-driven system management for large computational facilities
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
On the usefulness of object tracking techniques in performance analysis
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Framework for a productive performance optimization
Parallel Computing
A scalable barotropic mode solver for the parallel ocean program
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Test-driven coarray parallelization of a legacy Fortran application
SE-HPCCSE '13 Proceedings of the 1st International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering
High-performance design patterns for modern Fortran
SE-HPCCSE '13 Proceedings of the 1st International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering
C2FPGA-A dependency-timing graph design methodology
Journal of Parallel and Distributed Computing
Towards making autotuning mainstream
International Journal of High Performance Computing Applications
Tools for machine-learning-based empirical autotuning and specialization
International Journal of High Performance Computing Applications
A tool to analyze the performance of multithreaded programs on NUMA architectures
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Making problem diagnosiswork for large-scale, production storage systems
LISA'13 Proceedings of the 27th international conference on Large Installation System Administration
SPM-Sieve: a framework for assisting data partitioning in scratch pad memory based systems
Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems
Optimizing I/O forwarding techniques for extreme-scale event tracing
Cluster Computing
Visualizing large-scale parallel communication traces using a particle animation technique
EuroVis '13 Proceedings of the 15th Eurographics Conference on Visualization
A visual approach to investigating shared and global memory behavior of CUDA kernels
EuroVis '13 Proceedings of the 15th Eurographics Conference on Visualization
Hi-index | 0.00 |
The ability of performance technology to keep pace with the growing complexity of parallel and distributed systems depends on robust performance frameworks that can at once provide system-specific performance capabilities and support high-level performance problem solving. Flexibility and portability in empirical methods and processes are influenced primarily by the strategies available for instrmentation and measurement, and how effectively they are integrated and composed. This paper presents the TAU (Tuning and Analysis Utilities) parallel performance sytem and describe how it addresses diverse requirements for performance observation and analysis.