Performance tuning of matrix triple products based on matrix structure

Authors:
Eun-Jin Im;Ismail Bustany;Cleve Ashcraft;James W. Demmel;Katherine A. Yelick
Affiliations:
Kookmin University, Seoul, Korea;Barcelona Design Inc;Livermore Software Technology Corporation;U.C. Berkeley;U.C. Berkeley
Venue:
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Year:
2004

Citing 5
Cited 1

Sparse matrices in matlab: design and implementation

SIAM Journal on Matrix Analysis and Applications
The primal-dual method for approximation algorithms and its application to network design problems

Approximation algorithms for NP-hard problems
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Sparsity: Optimization Framework for Sparse Matrix Kernels

International Journal of High Performance Computing Applications
Memory hierarchy optimizations and performance bounds for sparse ATAx

ICCS'03 Proceedings of the 2003 international conference on Computational science: PartIII

Poster: I/O workload analysis with server-side data collection

Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sparse matrix computations arise in many scientific and engineering applications, but their performance is limited by the growing gap between processor and memory speed. In this paper, we present a case study of an important sparse matrix triple product problem that commonly arises in primal-dual optimization method. Instead of a generic two-phase algorithm, we devise and implement a single pass algorithm that exploits the block diagonal structure of the matrix. Our algorithm uses fewer floating point operations and roughly half the memory of the two-phase algorithm. The speed-up of the one-phase scheme over the two-phase scheme is 2.04 on a 900 MHz Intel Itanium-2, 1.63 on an 1 GHz Power-4, and 1.99 on a 900 MHz Sun Ultra-3.