Software pipelining: an effective scheduling technique for VLIW machines
PLDI '88 Proceedings of the ACM SIGPLAN 1988 conference on Programming Language design and Implementation
QCDPAX-an MIMD array of vector processors for the numerical simulation of quantum chromodynamics
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Data cache performance of supercomputer applications
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
An effective on-chip preloading scheme to reduce data access penalty
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Register allocation for software pipelined loops
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Pseudo vector processor based on register-windowed superscalar pipeline
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
High performance Fortran language specification
ACM SIGPLAN Fortran Forum
A scalar architecture for pseudo vector processing based on slide-windowed registers
ICS '93 Proceedings of the 7th international conference on Supercomputing
A performance study of software and hardware data prefetching schemes
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Performance of Various Computers Using Standard Linear Equations Software
Performance of Various Computers Using Standard Linear Equations Software
Hi-index | 0.00 |
CP-PACS (Computational Physics by Parallel Array Computer System) is a massively parallel processor with 2048 Processing Units built at Center for Computational Physics, University of Tsukuba. The node processor of CP-PACS is a RISC microprocessor enhanced by Psuedo Vector Processing feature, which can realize high-performance vector processing. The interconnection network is 3-dimensional Hyper-Crossbar Network, which has high flexibility and embeddability for various network topologies and communication patterns. The theoretical peak performance of whole system is 614.4 GFLOPS. In this paper, we describe the overview of CP-PACS architecture and several special architectural characteristics of it. The performance evaluation on parallel LINPACK benchmark is also shown.