Design choices in the SHRIMP system: an empirical study
Proceedings of the 25th annual international symposium on Computer architecture
Sciddle 4.0, or, Remote Procedure Calles in PVM
HPCN Europe 1996 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Dynamic Coscheduling on Workstation Clusters
IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Molecular Dynamics Simulations on Cray Clusters using the SCIDDLE-PVM environment
EuroPVM '96 Proceedings of the Third European PVM Conference on Parallel Virtual Machine
Architectural Implications of a Family of Irregular Applications
HPCA '98 Proceedings of the 4th International Symposium on High-Performance Computer Architecture
Scalability and resource usage of an OLAP benchmark on clusters of PCs
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Performance evaluation of distributed database on PC cluster computers
WSEAS Transactions on Computers
Hi-index | 0.00 |
In distributed and vectorized computing there is a large number of highly different supercomputing platforms an application could run on. Therefore most traditional parallel codes are ill equipped to collect data about their resource usage or their behavior at run time and the corresponding data are rarely published and few scientists attack the planning of an application and its platform systematically. As an improvement over the current state of the art, we propose an integrated approach to performance evaluation, modeling and prediction for different platforms. Our approach uses a combination of analytical modeling and systematically designed experimentation with full application runs, reduced application kernels and some benchmarks. We studied our methodology of performance assessment with Opal, an example code in molecular biology, developed at our institution to run on our four Cray J90 ``Classic" Vector SMPs. Besides a detailed assessment of performance achieved on the J90s, the primary goal of our study was to find the most suitable and most cost effective hardware platform for the application, in particular to check the suitability of this application for slow CoPs, SMP CoPs and fast CoPs, three flavors of Clusters of PCs built with off-the-shelf Intel Pentium processors. A performance assessment based on our model is much easier than porting and parallelizing the application for a new target machine and so we could easily obtain and include performance estimates for a T3E-900, a high end MPP system. The predicted execution times and speedup figures indicate that a well designed cluster of PCs achieves similar if not better performance than the J90 vector processors currently used and that the computational efficiency compares favorably to the T3E-900 for that particular application code.