Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Scientific Computations on Modern Parallel Vector Systems
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Leading Computational Methods on Scalar and Vector HEC Platforms
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
The Cray BlackWidow: a highly scalable vector multiprocessor
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Effects of MSHR and Prefetch Mechanisms on an On-Chip Cache of the Vector Architecture
ISPA '08 Proceedings of the 2008 IEEE International Symposium on Parallel and Distributed Processing with Applications
Performance tuning and analysis of future vector processors based on the roofline model
Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
World-highest resolution global atmospheric model and its performance on the Earth Simulator
State of the Practice Reports
Exploring the Tradeoffs between Programmability and Efficiency in Data-Parallel Accelerators
ACM Transactions on Computer Systems (TOCS)
Hi-index | 0.00 |
This paper describes a new-generation vector parallel supercomputer, NEC SX-9 system. The SX-9 processor has an outstanding core to achieve over 100Gflop/s, and a software-controllable on-chip cache to keep the high ratio of the memory bandwidth to the floating-point operation rate. Moreover, its large SMP nodes of 16 vector processors with 1.6Tflop/s performance and 1TB memory are connected with dedicated network switches, which can achieve inter-node communication at 128GB/s per direction. The sustained performance of the SX-9 processor is evaluated using six practical applications in comparison with conventional vector processors and the latest scalar processor such as Nehalem-EP. Based on the results, this paper discusses the performance tuning strategies for new-generation vector systems. An SX-9 system of 16 nodes is also evaluated by using the HPC challenge benchmark suite and a CFD code. Those evaluation results clarify the highest sustained performance and scalability of the SX-9 system.