The cache performance and optimizations of blocked algorithms
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A scalar architecture for pseudo vector processing based on slide-windowed registers
ICS '93 Proceedings of the 7th international conference on Supercomputing
Solving Linear Systems on Vector and Shared Memory Computers
Solving Linear Systems on Vector and Shared Memory Computers
Performance of Various Computers Using Standard Linear Equations Software
Performance of Various Computers Using Standard Linear Equations Software
Hi-index | 0.00 |
CP-PACS (Computational Physics by Parallel Array Computer System) is a massively parallel processing system with 2048 node processors for large scale scientific calculations. On a node processor of CP-PACS, there is a special hardware feature called PVP-SW (Pseudo Vector Processor based on Slide Window), which realizes an efficient vector processing on a superscalar processor without depending on the cache.In this paper, we present the effectiveness of PVP-SW by performance measurement on single node processor for LINPACK benchmark. Utilizing loop unrolling techniques and Block-TLB feature, PVP-SW function improves the basic performance up to 3.3 times faster for 1000x1000 LINPACK. This performance corresponds to the 73% of theoretical peak.