Parallel ocean general circulation modeling
Proceedings of the eleventh annual international conference of the Center for Nonlinear Studies on Experimental mathematics : computational issues in nonlinear science: computational issues in nonlinear science
A numerical method for the study of the circulation of the world ocean
Journal of Computational Physics - Special issue: commenoration of the 30th anniversary
Efficient management of parallelism in object-oriented numerical software libraries
Modern software tools for scientific computing
Agile application-aware adaptation for mobility
Proceedings of the sixteenth ACM symposium on Operating systems principles
A scalable cross-platform infrastructure for application performance tuning using hardware counters
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
The Autopilot performance-directed adaptive control system
Future Generation Computer Systems - I. High Performance Numerical Methods and Applications. II. Performance Data Mining: Automated Diagnosis, Adaption, and Optimization
Dynamic load balancing of SAMR applications on distributed systems
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Predictive performance and scalability modeling of a large-scale application
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Zoltan Data Management Service for Parallel Dynamic Applications
Computing in Science and Engineering
Active harmony: towards automated performance tuning
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
An overview of the BlueGene/L Supercomputer
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Prophesy: an infrastructure for performance analysis and modeling of parallel and grid applications
ACM SIGMETRICS Performance Evaluation Review
Blue Gene/L, a System-On-A-Chip
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
Scheduling From the Perspective of the Application
HPDC '96 Proceedings of the 5th IEEE International Symposium on High Performance Distributed Computing
Forecasting network performance to support dynamic scheduling using the network weather service
HPDC '97 Proceedings of the 6th IEEE International Symposium on High Performance Distributed Computing
Autopilot: Adaptive Control of Distributed Applications
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Prediction and Adaptation in Active Harmony
HPDC '98 Proceedings of the 7th IEEE International Symposium on High Performance Distributed Computing
Dome: Parallel Programming in a Heterogeneous Multi-User Environment
Dome: Parallel Programming in a Heterogeneous Multi-User Environment
Exposing Application Alternatives
ICDCS '99 Proceedings of the 19th IEEE International Conference on Distributed Computing Systems
Advances in the TAU performance system
Performance analysis and grid computing
Automated Cluster-Based Web Service Performance Tuning
HPDC '04 Proceedings of the 13th IEEE International Symposium on High Performance Distributed Computing
Using Information from Prior Runs to Improve Automated Tuning Systems
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
An API for Runtime Code Patching
International Journal of High Performance Computing Applications
The GrADS Project: Software Support for High-Level Grid Application Development
International Journal of High Performance Computing Applications
Scaling physics and material science applications on a massively parallel Blue Gene/L system
Proceedings of the 19th annual international conference on Supercomputing
Blue Gene/L torus interconnection network
IBM Journal of Research and Development
Design and implementation of message-passing services for the Blue Gene/L supercomputer
IBM Journal of Research and Development
Scientific Programming - Large-Scale Programming Tools and Environments
Lessons learned at 208K: towards debugging millions of cores
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Diagnosing performance bottlenecks in emerging petascale applications
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Evaluating similarity-based trace reduction techniques for scalable performance analysis
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Scalable Identification of Load Imbalance in Parallel Executions Using Call Path Profiles
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Scalable fine-grained call path tracing
Proceedings of the international conference on Supercomputing
Trace profiling: Scalable event tracing on high-end parallel systems
Parallel Computing
A divide and conquer strategy for scaling weather simulations with multiple regions of interest
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Seshat collects MPI traces: extended abstract
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Towards scalable event tracing for high end systems
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
A divide and conquer strategy for scaling weather simulations with multiple regions of interest
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
Applications on today's massively parallel supercomputers are often guided with performance analysis tools toward scalable performance on thousands of processors. However, conventional tools for parallel performance analysis have serious problems due to the large data volume that needs to be handled. In this paper, we discuss the scalability issue for MPI performance analysis on Blue Gene/L, the world's fastest supercomputing platform. First we present an experimental study of existing MPI performance tools that were ported to BG/L from other platforms. These tools can be classified into two categories: profiling tools that collect timing summaries, and tracing tools that collect a sequence of time-stamped events. Profiling tools produce small data volumes and can scale well, but tracing tools tend to scale poorly. We then describe a configurable MPI tracing tool developed for BG/L. By providing a configurable method for trace generation. the volume of trace data can be controlled, and scalability is significantly improved.