Performance Modeling and Optimal Block Size Selection for a BLAS-3 Based Tridiagonalization Algorithm

  • Authors:
  • Yusaku Yamamoto

  • Affiliations:
  • Nagoya University, Japan

  • Venue:
  • HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We construct a performance model for Bischof \xiWu's tridiagonalization algorithm that is fully based on the level-3 BLAS. The model has a hierarchical struc- ture, which reflects the hierarchical structure of the original algorithm, and given the matrix size, the two block sizes and the performance data of the underlying BLAS routines, predicts the execution time of the algo- rithm. Experiments on the Opteron and Alpha 21264A processors show that the model is quite accurate and can predict the performance of the algorithm for ma trix sizes from 1920 to 7680 and for various block sizes with relative errors below 10%. The model will serve as a key component of an automatic tuned library that selects the optimal block sizes itself It can also be used in a Grid environment to help the user find which of the available machines to use to solve his/her problem in the shortest time.