A probe effect in concurrent programs
Software—Practice & Experience
What to draw? When to draw?: an essay on parallel program visualization
Journal of Parallel and Distributed Computing - Special issue on tools and methods for visualization of parallel systems and computations
Dynamic statistical profiling of communication activity in distributed applications
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A Unified Trace Environment for IBM SP Systems
IEEE Parallel & Distributed Technology: Systems & Technology
Application-Dependent Dynamic Monitoring of Distributed and Parallel Systems
IEEE Transactions on Parallel and Distributed Systems
Visualization of Do-Loop Performance
HPCN Europe '97 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
A Dynamic Periodicity Detector: Application to Speedup Computation
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Efficient Time Series Matching by Wavelets
ICDE '99 Proceedings of the 15th International Conference on Data Engineering
A Trace-Scaling Agent for Parallel Application Tracing
ICTAI '02 Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence
Monitoring Strategies for Hypercube Systems
PDP '96 Proceedings of the 4th Euromicro Workshop on Parallel and Distributed Processing (PDP '96)
Systematic Assessment of the Overhead of Tracing Parallel Programs
PDP '96 Proceedings of the 4th Euromicro Workshop on Parallel and Distributed Processing (PDP '96)
An Algebra for Cross-Experiment Performance Analysis
ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
High Performance Event Trace Visualization
PDP '05 Proceedings of the 13th Euromicro Conference on Parallel, Distributed and Network-Based Processing
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Low Overhead High Performance Runtime Monitoring of Collective Communication
ICPP '05 Proceedings of the 2005 International Conference on Parallel Processing
Toward Scalable Performance Visualization with Jumpshot
International Journal of High Performance Computing Applications
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
MPI performance analysis tools on Blue Gene/L
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Measuring and characterizing system behavior using kernel-level event logging
ATEC '00 Proceedings of the annual conference on USENIX Annual Technical Conference
A test suite for parallel performance analysis tools: Research Articles
Concurrency and Computation: Practice & Experience - European–American Working Group on Automatic Performance Analysis (APART)
Automatic analysis of inefficiency patterns in parallel applications: Research Articles
Concurrency and Computation: Practice & Experience - European–American Working Group on Automatic Performance Analysis (APART)
Preserving time in large-scale communication traces
Proceedings of the 22nd annual international conference on Supercomputing
Scalable load-balance measurement for SPMD codes
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Evaluating similarity-based trace reduction techniques for scalable performance analysis
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Performance feature identification by comparative trace analysis
Future Generation Computer Systems
Scaling molecular dynamics to 3000 processors with projections: a performance analysis case study
ICCS'03 Proceedings of the 2003 international conference on Computational science
A performance prediction framework for scientific applications
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
A new data compression technique for event based program traces
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Scalable event trace visualization
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Scalable parallel trace-based performance analysis
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Model-Based relative performance diagnosis of wavefront parallel computations
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Trace-based parallel performance overhead compensation
HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications
Introducing the open trace format (OTF)
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Towards scalable event tracing for high end systems
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Hi-index | 0.00 |
Accurate performance analysis of high end systems requires event-based traces to correctly identify the root cause of a number of the complex performance problems that arise on these highly parallel systems. These high-end architectures contain tens to hundreds of thousands of processors, pushing application scalability challenges to new heights. Unfortunately, the collection of event-based data presents scalability challenges itself: the large volume of collected data increases tool overhead, and results in data files that are difficult to store and analyze. Our solution to these problems is a new measurement technique called trace profiling that collects the information needed to diagnose performance problems that traditionally require traces, but at a greatly reduced data volume. The trace profiling technique reduces the amount of data stored by capitalizing on the repeated behavior of programs, and on the similarity of the behavior and performance of parallel processes in an application run. Trace profiling is a hybrid between profiling and tracing, collecting summary information about the event patterns in an application run. Because the data has already been classified into behavior categories, we can present reduced, partially analyzed performance data to the user, highlighting the performance behaviors that comprised most of the execution time.