Accurate performance evaluation, modelling and prediction of a message passing simulation code based on middleware

  • Authors:
  • Michela Taufer;Thomas Stricker

  • Affiliations:
  • Swiss Institute of Technology (ETH), CH-8092 Zuerich, Switzerland;Swiss Institute of Technology (ETH), CH-8092 Zuerich, Switzerland

  • Venue:
  • SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
  • Year:
  • 1998

Quantified Score

Hi-index 0.00

Visualization

Abstract

In distributed and vectorized computing there is a large number of highly different supercomputing platforms an application could run on. Therefore most traditional parallel codes are ill equipped to collect data about their resource usage or their behavior at run time and the corresponding data are rarely published and few scientists attack the planning of an application and its platform systematically. As an improvement over the current state of the art, we propose an integrated approach to performance evaluation, modeling and prediction for different platforms. Our approach uses a combination of analytical modeling and systematically designed experimentation with full application runs, reduced application kernels and some benchmarks. We studied our methodology of performance assessment with Opal, an example code in molecular biology, developed at our institution to run on our four Cray J90 ``Classic" Vector SMPs. Besides a detailed assessment of performance achieved on the J90s, the primary goal of our study was to find the most suitable and most cost effective hardware platform for the application, in particular to check the suitability of this application for slow CoPs, SMP CoPs and fast CoPs, three flavors of Clusters of PCs built with off-the-shelf Intel Pentium processors. A performance assessment based on our model is much easier than porting and parallelizing the application for a new target machine and so we could easily obtain and include performance estimates for a T3E-900, a high end MPP system. The predicted execution times and speedup figures indicate that a well designed cluster of PCs achieves similar if not better performance than the J90 vector processors currently used and that the computational efficiency compares favorably to the T3E-900 for that particular application code.