FACT: fast communication trace collection for parallel applications through program slicing

Authors:
Jidong Zhai;Tianwei Sheng;Jiangzhou He;Wenguang Chen;Weimin Zheng
Affiliations:
Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China
Venue:
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Year:
2009

Citing 25
Cited 7

Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
The program dependence graph and its use in optimization

ACM Transactions on Programming Languages and Systems (TOPLAS)
Interprocedural slicing using dependence graphs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Using Program Slicing in Software Maintenance

IEEE Transactions on Software Engineering
Modern complier implementation in C: basic techniques

Modern complier implementation in C: basic techniques
Advanced compiler design and implementation

Advanced compiler design and implementation
Programmers use slices when debugging

Communications of the ACM
Statistical scalability analysis of communication operations in distributed applications

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
An efficient way to find the side effects of procedure calls and the aliases of variables

POPL '79 Proceedings of the 6th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Interconnection Networks: An Engineering Approach

Interconnection Networks: An Engineering Approach
Predictive performance and scalability modeling of a large-scale application

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Characterization of Communication Patterns in Message-Passing Parallel Scientific Application Programs

CANPC '98 Proceedings of the Second International Workshop on Network-Based Parallel Computing: Communication, Architecture, and Applications
DiP: A Parallel Program Development Environment

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing-Volume II
A framework for performance modeling and prediction

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Towards a Communication Characterization Methodology for Parallel Applications

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Communication characteristics of large-scale scientific applications for contemporary cluster architectures

Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Switch Design to Enable Predictive Multiplexed Switching in Multiprocessor Networks

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
The Tau Parallel Performance System

International Journal of High Performance Computing Applications
Data-Flow Analysis for MPI Programs

ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
MPIPP: an automatic profile-guided parallel process placement toolset for SMP clusters and multiclusters

Proceedings of the 20th annual international conference on Supercomputing
Hiding I/O latency with pre-execution prefetching for parallel applications

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Detecting Patterns in MPI Communication Traces

ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
MPIWiz: subgroup reproducible replay of mpi applications

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Communication-Sensitive Static Dataflow for Parallel Message Passing Applications

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
A compiler-based communication analysis approach for multiprocessor systems

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Near-optimal placement of MPI processes on hierarchical NUMA architectures

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
ScalaExtrap: trace-based communication extrapolation for spmd programs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Automatic generation of executable communication specifications from parallel applications

Proceedings of the international conference on Supercomputing
ScalaExtrap: Trace-based communication extrapolation for SPMD programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Auto-generation of communication benchmark traces

ACM SIGMETRICS Performance Evaluation Review
Elastic and scalable tracing and accurate replay of non-deterministic events

Proceedings of the 27th international ACM conference on International conference on supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A proper understanding of communication patterns of parallel applications is important to optimize application performance and design better communication subsystems. Communication patterns can be obtained by analyzing communication traces. However, existing approaches to generate communication traces need to execute the entire parallel applications on full-scale systems that are time-consuming and expensive. In this paper, we propose a novel technique, called Fact, which can perform FAst Communication Trace collection for large-scale parallel applications on small-scale systems. Our idea is to reduce the original program to obtain a program slice through static analysis, and to execute the program slice to acquire the communication traces. The program slice preserves all the variables and statements in the original program relevant to spatial and volume communication attributes. Our idea is based on an observation that most computation and message contents in message-passing parallel applications are independent of these attributes, and therefore can be removed from the programs for the purpose of communication trace collection. We have implemented Fact and evaluated it with NPB programs and Sweep3D. The results show that Fact can preserve the spatial and volume communication attributes of original programs and reduce resource consumptions by two orders of magnitude in most cases. For example, Fact collects the communication traces of the Sweep3D for 512 processes on a 4-node (32 cores) platform in just 6.79 seconds, consuming 1.25 GB memory, while the original program takes 256.63 seconds and consumes 213.83 GB memory on a 32-node (512 cores) platform. Finally, we present an application of Fact.