Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
Architectural requirements of parallel scientific applications with explicit communication
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Execution characteristics of desktop applications on Windows NT
Proceedings of the 25th annual international symposium on Computer architecture
Portable profiling and tracing for parallel, scientific applications using C++
SPDT '98 Proceedings of the SIGMETRICS symposium on Parallel and distributed tools
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Architectural requirements and scalability of the NAS parallel benchmarks
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Very high resolution simulation of compressible turbulence on the IBM-SP system
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Performance evaluation of the IBM SP and the Compaq AlphaServer SC
Proceedings of the 14th international conference on Supercomputing
Performance analysis of the Alpha 21264-based Compaq ES40 system
Proceedings of the 27th annual international symposium on Computer architecture
Semicoarsening Multigrid on Distributed Memory Machines
SIAM Journal on Scientific Computing
A comparison of three programming models for adaptive applications on the Origin2000
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Performance modeling and tuning of an unstructured mesh CFD application
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Demonstrating the scalability of a molecular dynamics application on a Petaflop computer
ICS '01 Proceedings of the 15th international conference on Supercomputing
Parallel Computer Architecture: A Hardware/Software Approach
Parallel Computer Architecture: A Hardware/Software Approach
MPI-The Complete Reference, Volume 1: The MPI Core
MPI-The Complete Reference, Volume 1: The MPI Core
Large scale parallel structured AMR calculations using the SAMRAI framework
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
NAS Parallel Benchmark Results
IEEE Parallel & Distributed Technology: Systems & Technology
LAPACK for Distributed Memory Architectures: Progress Report
Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing
A General Predictive Performance Model for Wavefront Algorithms on Clusters of SMPs
ICPP '00 Proceedings of the Proceedings of the 2000 International Conference on Parallel Processing
Dynamic statistical profiling of communication activity in distributed applications
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The Forgotten Factor: Facts on Performance Evaluation and Its Dependence on Workloads
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Workload Modeling for Performance Evaluation
Performance Evaluation of Complex Systems: Techniques and Tools, Performance 2002, Tutorial Lectures
An empirical performance evaluation of scalable scientific applications
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Scalable analysis techniques for microprocessor performance counter metrics
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
METRIC: tracking down inefficiencies in the memory hierarchy via binary rewriting
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
High performance RDMA-based MPI implementation over InfiniBand
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
A Methodology for Designing Efficient On-Chip Interconnects on Well-Behaved Communication Patterns
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Detailed cache coherence characterization for OpenMP benchmarks
Proceedings of the 18th annual international conference on Supercomputing
An analysis of the impact of MPI overlap and independent progress
Proceedings of the 18th annual international conference on Supercomputing
BCS-MPI: A New Approach in the System Software Design for Large-Scale Parallel Computers
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
An Evaluation of Two Implementation Strategies for Optimizing One-Sided Atomic Reduction
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
International Journal of High Performance Computing Applications
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A Design Methodology for Efficient Application-Specific On-Chip Interconnects
IEEE Transactions on Parallel and Distributed Systems
High performance RDMA-based MPI implementation over infiniBand
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Optimizing communication overlap for high-speed networks
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Nomad: migrating OS-bypass networks in virtual machines
Proceedings of the 3rd international conference on Virtual execution environments
Source-Code-Correlated Cache Coherence Characterization of OpenMP Benchmarks
IEEE Transactions on Parallel and Distributed Systems
High performance MPI design using unreliable datagram for ultra-scale InfiniBand clusters
Proceedings of the 21st annual international conference on Supercomputing
Light-weight synchronization for inter-processor communication acceleration on embedded MPSoCs
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
NIC-based reduction algorithms for large-scale clusters
International Journal of High Performance Computing and Networking
Broadcasting algorithm of constant complexity for fully-switched clusters
SEPADS'06 Proceedings of the 5th WSEAS International Conference on Software Engineering, Parallel and Distributed Systems
Performance implications of virtualizing multicore cluster machines
Proceedings of the 2nd workshop on System-level virtualization for high performance computing
Runtime optimization of vector operations on large scale SMP clusters
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Message fragment based causal message logging
Journal of Parallel and Distributed Computing
Power saving in regular interconnection networks
Parallel Computing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Dynamic power saving in fat-tree interconnection networks using on/off links
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Adaptive connection management for scalable MPI over InfiniBand
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A case for non-blocking collective operations
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Congestion avoidance on manycore high performance computing systems
Proceedings of the 26th ACM international conference on Supercomputing
Exploiting communication and packaging locality for cost-effective large scale networks
Proceedings of the 26th ACM international conference on Supercomputing
Power-aware fat-tree networks using on/off links
HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
Numprof: a performance analysis framework for numerical libraries
PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Hi-index | 0.00 |
This paper examines the explicit communication characteristics of several sophisticated scientific applications, which, by themselves, constitute a representative suite of publicly available benchmarks for large cluster architectures. By focusing on the Message Passing Interface (MPI) and by using hardware counters on the microprocessor, we observe each application's inherent behavioral characteristics: point-to-point and collective communication, and floating-point operations. Furthermore, we explore the sensitivities of these characteristics to both problem size and number of processors. Our analysis reveals several striking similarities across our diverse set of applications including the use of collective operations, especially those collectives with very small data payloads. We also highlight a trend of novel applications parting with regimented, static communication patterns in favor of dynamically evolving patterns, as evidenced by our experiments on applications that use implicit linear solvers and adaptive mesh refinement. Overall, our study contributes a better understanding of the requirements of current and emerging paradigms of scientific computing in terms of their computation and communication demands.