Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology
ICS '97 Proceedings of the 11th international conference on Supercomputing
Practical experience in the numerical dangers of heterogeneous computing
ACM Transactions on Mathematical Software (TOMS)
Applied numerical linear algebra
Applied numerical linear algebra
ScaLAPACK user's guide
A fast Fourier transform compiler
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
The Autopilot performance-directed adaptive control system
Future Generation Computer Systems - I. High Performance Numerical Methods and Applications. II. Performance Data Mining: Automated Diagnosis, Adaption, and Optimization
Dense linear algebra kernels on heterogeneous platforms: redistribution issues
Parallel Computing - Parallel matrix algorithms and applications
Knowledge Discovery in Auto-tuning Parallel Numerical Library
Progress in Discovery Science, Final Report of the Japanese Discovery Science Project
Effect of auto-tuning with user's knowledge for numerical software
Proceedings of the 1st conference on Computing frontiers
Architecture of an automatically tuned linear algebra library
Parallel Computing
Using experimental data to improve the performance modelling of parallel linear algebra routines
PPAM'07 Proceedings of the 7th international conference on Parallel processing and applied mathematics
A massively parallel dense symmetric eigensolver with communication splitting multicasting algorithm
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Empirical Installation of Linear Algebra Shared-Memory Subroutines for Auto-Tuning
International Journal of Parallel Programming
Hi-index | 0.00 |
Conventional auto-tuning numerical software has the drawbacks of (1) fixed sampling points for the performance estimation; (2) inadequate adaptation to heterogeneous environments. To solve these drawbacks, we developed ABCLib_DRSSED, which is a parallel eigensolver with an auto-tuning facility. ABCLib_DRSSED has (1) functions based on the sampling points which are constructed with an end-user interface; (2) a load-balancer for the data to be distributed; (3) a new auto-tuning optimization timing called Before Execute-time Optimization (BEO). In our performance evaluation of the BEO, we obtained speedup factors from 10% to 90%, and 340% in the case of a failed estimation. In the evaluation of the load-balancer, the performance was 220% improved.