Performance Improvement for Matrix Calculation on CP-PACS Node Processor

Authors:
Y. Abei;K. Itakura;T. Boku;H. Nakamura;K. Nakazawa
Affiliations:
-;-;-;-;-
Venue:
HPC-ASIA '97 Proceedings of the High-Performance Computing on the Information Superhighway, HPC-Asia '97
Year:
1997

Citing 4
Cited 0

The cache performance and optimizations of blocked algorithms

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
A scalar architecture for pseudo vector processing based on slide-windowed registers

ICS '93 Proceedings of the 7th international conference on Supercomputing
Solving Linear Systems on Vector and Shared Memory Computers

Solving Linear Systems on Vector and Shared Memory Computers
Performance of Various Computers Using Standard Linear Equations Software

Performance of Various Computers Using Standard Linear Equations Software

Quantified Score

Hi-index	0.00

Visualization

Abstract

CP-PACS (Computational Physics by Parallel Array Computer System) is a massively parallel processing system with 2048 node processors for large scale scientific calculations. On a node processor of CP-PACS, there is a special hardware feature called PVP-SW (Pseudo Vector Processor based on Slide Window), which realizes an efficient vector processing on a superscalar processor without depending on the cache.In this paper, we present the effectiveness of PVP-SW by performance measurement on single node processor for LINPACK benchmark. Utilizing loop unrolling techniques and Block-TLB feature, PVP-SW function improves the basic performance up to 3.3 times faster for 1000x1000 LINPACK. This performance corresponds to the 73% of theoretical peak.