Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
Architectural requirements of parallel scientific applications with explicit communication
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Execution characteristics of desktop applications on Windows NT
Proceedings of the 25th annual international symposium on Computer architecture
Portable profiling and tracing for parallel, scientific applications using C++
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Architectural requirements and scalability of the NAS parallel benchmarks
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Very high resolution simulation of compressible turbulence on the IBM-SP system
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Performance evaluation of the IBM SP and the Compaq AlphaServer SC
Proceedings of the 14th international conference on Supercomputing
Performance analysis of the Alpha 21264-based Compaq ES40 system
Proceedings of the 27th annual international symposium on Computer architecture
Semicoarsening Multigrid on Distributed Memory Machines
SIAM Journal on Scientific Computing
A comparison of three programming models for adaptive applications on the Origin2000
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Performance modeling and tuning of an unstructured mesh CFD application
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Demonstrating the scalability of a molecular dynamics application on a Petaflop computer
ICS '01 Proceedings of the 15th international conference on Supercomputing
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
MPI-The Complete Reference, Volume 1: The MPI Core
MPI-The Complete Reference, Volume 1: The MPI Core
Large scale parallel structured AMR calculations using the SAMRAI framework
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
NAS Parallel Benchmark Results
IEEE Parallel & Distributed Technology: Systems & Technology
MPX: Software for Multiplexing Hardware Performance Counters in Multithreaded Programs
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
LAPACK for Distributed Memory Architectures: Progress Report
Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing
A General Predictive Performance Model for Wavefront Algorithms on Clusters of SMPs
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Detailed cache coherence characterization for OpenMP benchmarks
Proceedings of the 18th annual international conference on Supercomputing
Cross-Platform Performance Prediction of Parallel Applications Using Partial Execution
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
METRIC: Memory tracing via dynamic binary rewriting to identify cache inefficiencies
ACM Transactions on Programming Languages and Systems (TOPLAS)
Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks
IEEE Transactions on Parallel and Distributed Systems
Performance evaluation of the Sun Fire Link SMP clusters
International Journal of High Performance Computing and Networking
MPIWiz: subgroup reproducible replay of mpi applications
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Exploring pattern-aware routing in generalized fat tree networks
Proceedings of the 23rd international conference on Supercomputing
Extracting and predicting the communication behaviour of parallel applications
International Journal of Parallel, Emergent and Distributed Systems
Flow-aware allocation for on-chip networks
NOCS '09 Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip
FACT: fast communication trace collection for parallel applications through program slicing
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Model for simulation of heterogeneous high-performance computing environments
VECPAR'06 Proceedings of the 7th international conference on High performance computing for computational science
ompP: a profiling tool for OpenMP
IWOMP'05/IWOMP'06 Proceedings of the 2005 and 2006 international conference on OpenMP shared memory parallel programming
A compiler-based communication analysis approach for multiprocessor systems
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Parallelism and data movement characterization of contemporary application classes
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Impact of Kernel-assisted MPI communication over scientific applications: CPMD and FFTW
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
A network evaluation for LAN, MAN and WAN grid environments
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Optimization principles for collective neighborhood communications
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
On using incremental profiling for the performance analysis of shared memory parallel applications
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
A new memory slowdown model for the characterization of computing systems
PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Identifying HPC codes via performance logs and machine learning
Proceedings of the first workshop on Changing landscapes in HPC security
Characterization and modeling of PIDX parallel I/O for performance optimization
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
This paper examines the explicit communication characteristics of several sophisticated scientific applications, which, by themselves, constitute a representative suite of publicly available benchmarks for large cluster architectures. By focusing on the message passing interface (MPI) and by using hardware counters on the microprocessor, we observe each application's inherent behavioral characteristics: point-to-point and collective communication, and floating-point operations. Furthermore, we explore the sensitivities of these characteristics to both problem size and number of processors. Our analysis reveals several striking similarities across our diverse set of applications including the use of collective operations, especially those collectives with very small data payloads. We also highlight a trend of novel applications parting with regimented, static communication patterns in favor of dynamically evolving patterns, as evidenced by our experiments on applications that use implicit linear solvers and adaptive mesh refinement. Overall, our study contributes a better understanding of the requirements of current and emerging paradigms of scientific computing in terms of their computation and communication demands.