Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
A fast Fourier transform compiler
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Knowledge Discovery in Auto-tuning Parallel Numerical Library
Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
Parallel blocked sparse matrix-vector multiplication with dynamic parameter selection method
ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII
Hi-index | 0.00 |
We investigate an automatic tuning method for an eigensolver of a dense symmetric matrix. The aim of this paper is to investigate how to select the unrolling depth. To do this, we evaluate the performance of various unrolled reduction loops of the eigensolver for every matrix size from 3000 to 4000 on the Hitachi SR8000/F1 and on the IBM RS/6000 SP3. We also analyze the trend between Byte/Flop and performance for various patterns of loop unrolling. The result shows that the performance is degraded with higher depth of unrolling in some matrix sizes, where it does not occur with lower depth of unrolling. The result also shows that selection of the unrolling depth should be examined in the case of several matrix sizes.