Ultracomputers: a teraflop before its time
Communications of the ACM
A high performance linear equation solver on the VPP500 parallel supercomputer
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Scalar processor of the VPP500 parallel supercomputer
ICS '95 Proceedings of the 9th international conference on Supercomputing
Synchronization hardware for networks of workstations: performance vs. cost
ICS '96 Proceedings of the 10th international conference on Supercomputing
Efficient synchronization: let them eat QOLB
Proceedings of the 24th annual international symposium on Computer architecture
Out-of-order vector architectures
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Efficient conditional operations for data-parallel architectures
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A high performance linear equation solver on the VPP500 parallel supercomputer
Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Implementing virtual memory in a vector processor with software restart markers
Proceedings of the 20th annual international conference on Supercomputing
Hi-index | 0.00 |
The VPP500 vector parallel processor is a highly parallel, distributed memory supercomputer that has a performance range of 6.4 to 355 gigaFLOPS and a main memory capacity from 1 to 222 gigabytes. The system scalably supports between 4 and 222 processors interconnected by a high-bandwidth crossbar network.Three key aspects of the VPP500, which are in sharp contrast to current massively parallel systems, characterize its architecture. First the building block is a 1.6 gigaFLOPS vector processor that is more than an order of magnitude faster than the processors used in massively parallel processors (MPP). This high uniprocessor performance reduces the dependence on parallelism. Second the distributed memory architecture and high-bandwidth crossbar network eliminate many of the bottlenecks found in MPP systems. These allow efficient utilization of hardware and have the effect of lessening the complexity of programming parallel computers. Third the system realizes high throughput by its capability to arbitrarily partition the processing elements for flexible multiprocessing.