A real introduction to supercomputing: a user training course
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Improving Memory Traffic by Assembly-Level Exploitation of Reuses for Vector Registers
The Journal of Supercomputing
Performance characteristics of the Cray X1 and their implications for application performance tuning
Proceedings of the 18th annual international conference on Supercomputing
Scientific Computing in the $C^H$ Programming Language
Scientific Programming
Hi-index | 4.10 |
Vector pipelining and chaining are clarified through the use of timing and pipeline diagrams of the instruction execution process. The technique for evaluating the performance of the concurrent vector operations of vector processors is evaluated by testing two of the most widely used computers with vector facilities: the IBM 3090 and Cray X-MP. On the basis of the testing results analyzed at the assembler level, suggestions are given for machine users and designers about vectorization on these two machines. The ideas presented can be applied to other vector processors. The actual implementations, however, may differ, depending on individual machine architecture