Interconnection networks for large-scale parallel processing: theory and case studies (2nd ed.)
Interconnection networks for large-scale parallel processing: theory and case studies (2nd ed.)
The IFS model: a parallel production weather code
Parallel Computing - Special issue: climate and weather modeling
Massively parallel implementation of mesoscale compressible community model
Parallel Computing - Special issue on applications: parallel computing in regional weather modeling
Architectural and application: the performance of the NEC SX-4 on the NCAR benchmark suite
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Computer Architecture and Parallel Processing
Computer Architecture and Parallel Processing
Implementation of MPI on NEC's SX-4 Multi-Node Architecture
Proceedings of the 4th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Large atmospheric computation on the earth simulator: The LACES project
Scientific Programming
Hi-index | 0.00 |
The NEC SX-4M cluster and Fujitsu VPP700 supercomputers are both based on custom vector processors using low-power CMOS technology. Their basic architectures and programming models are however somewhat different. A multi-node SX-4M cluster contains up to 32 processors per shared memory node, with a maximum of 16 nodes connected via the proprietary NEC IXS fibre channel crossbar network. A hybrid combination of inter-node MPI message-passing with intra-node tasking or threads is possible. The Fujitsu VPP700 is a fully distributed-memory vector machine with a crossbar interconnect which also supports MPI. The parallel performance of the MC2 model for high-resolution mesoscale forecasting over large domains and of the IFS RAPS 4.0 benchmark are presented for several different machine configurations. These include an SX-4/32, an SX-4/32M cluster and up to 100 PE's of the VPP700. Our results indicate that performance degradation for both models on a single SX-4 node is primarily due to memory contention within the internal crossbar switch. Multinode SX-4 performance is slightly better than single node. Longer vector lengths and SDRAM memory on the VPP700 result in lower per processor execution rates. Both models achieve close to ideal scaling on the VPP700.