Operation Stacking for Ensemble Computations With Variable Convergence

Authors:
Mehmet Belgin;Godmar Back;Calvin J. Ribbens
Affiliations:
DEPARTMENT OF COMPUTER SCIENCE, VIRGINIA TECH, 2202KRAFT DRIVE, BLACKSBURG, VA 24060, USA;DEPARTMENT OF COMPUTER SCIENCE, VIRGINIA TECH, 2202KRAFT DRIVE, BLACKSBURG, VA 24060, USA;DEPARTMENT OF COMPUTER SCIENCE, VIRGINIA TECH, 2202KRAFT DRIVE, BLACKSBURG, VA 24060, USA
Venue:
International Journal of High Performance Computing Applications
Year:
2010

Citing 18
Cited 0

A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
LAPACK: a portable linear algebra library for high-performance computers

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Improving the memory-system performance of sparse-matrix vector multiplication

IBM Journal of Research and Development
Improving performance of sparse matrix-vector multiplication

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Achieving high sustained performance in an unstructured mesh CFD application

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Numerical Linear Algebra for High Performance Computers

Numerical Linear Algebra for High Performance Computers
Optimizing Sparse Matrix Computations for Register Reuse in SPARSITY

ICCS '01 Proceedings of the International Conference on Computational Sciences-Part I
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems
A Portable Programming Interface for Performance Evaluation on Modern Processors

International Journal of High Performance Computing Applications
Sparsity: Optimization Framework for Sparse Matrix Kernels

International Journal of High Performance Computing Applications
Optimizing Sparse Matrix-Vector Product Computations Using Unroll and Jam

International Journal of High Performance Computing Applications
On Improving Linear Solver Performance: A Block Variant of GMRES

SIAM Journal on Scientific Computing
Accelerating sparse matrix computations via data compression

Proceedings of the 20th annual international conference on Supercomputing
Computer Architecture, Fourth Edition: A Quantitative Approach

Computer Architecture, Fourth Edition: A Quantitative Approach
An operation stacking framework for large ensemble computations

Proceedings of the 21st annual international conference on Supercomputing
Optimizing sparse matrix-vector multiplication using index and value compression

Proceedings of the 5th conference on Computing frontiers
$\mathcal{H}_2$ Model Reduction for Large-Scale Linear Dynamical Systems

SIAM Journal on Matrix Analysis and Applications
Fast sparse matrix-vector multiplication by exploiting variable block structure

HPCC'05 Proceedings of the First international conference on High Performance Computing and Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Sparse matrix operations achieve only small fractions of peak CPU speeds because of the use of specialized, index-based matrix representations, which degrade cache utilization by imposing irregular memory accesses and increasing the number of overall accesses. Compounding the problem, the small number of floating-point operations in a single sparse iteration leads to low floating-point pipeline utilization. Operation stacking addresses these problems for large ensemble computations that solve multiple systems of linear equations with identical sparsity structure. By combining the data of multiple problems and solving them as one, operation stacking improves locality, reduces cache misses, and increases floating-point pipeline utilization. Operation stacking also requires less memory bandwidth because it involves fewer index array accesses. In this paper we present the Operation Stacking Framework (OSF), an object-oriented framework that provides runtime and code generation support for the development of stacked iterative solvers. OSFâ聙聶s runtime component provides an iteration engine that supports efficient ejection of converged problems from the stack. It separates the specific solver algorithm from the coding conventions and data representations that are necessary to implement stacking. Stacked solvers created with OSF can be used transparently without requiring significant changes to existing applications. Our results show that stacking can provide speedups up to 1.94脙聴 with an average of 1.46脙聴, even in scenarios in which the number of iterations required to converge varies widely within a stack of problems. Our evaluation shows that these improvements correlate with better cache utilization, improved floating-point utilization, and reduced memory accesses.