An operation stacking framework for large ensemble computations

  • Authors:
  • Mehmet Belgin;Calvin J. Ribbens;Godmar Back

  • Affiliations:
  • State University, Blacksburg, VA;State University, Blacksburg, VA;State University, Blacksburg, VA

  • Venue:
  • Proceedings of the 21st annual international conference on Supercomputing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Iterative solutions of sparse problems often achieve only a small fraction of the peak theoretical performance on modern architectures. This problem is highly challenging because sparse matrix storage schemes require data to be accessed irregularly, which leads to massive cache misses. Furthermore, the inner loop of typical sparse matrix operations accesses only a small and variable amount of data, which not only leads to low utilization of floating point registers, but also prevents optimization techniques that improve instruction level parallelism (ILP), such as unroll and jam. Although a general solution to this problem has not been found, significant performance improvements can be made for at least one important special case, namely large ensemble computations, which run the same application repeatedly on different data sets. In this paper, we present the Operation Stacking Framework (OSF), which runs multiple sparse problems simultaneously, stacking their data and solving them as one, thus improving both cache and ILP utilization. Programmers can use stacked solvers transparently in their applications. Moreover, OSF provides an API that makes it simple to convert existing solvers such as the conjugate gradient (CG) and generalized minimal residual (GMRES) methods into a stacked form. Our experimental results show that stacking can reduce the number of L2 misses by 25% to 44%, resulting in performance improvements of up to 1.95x with an average of 1.60x for stacked CG and GMRES algorithms on a single CPU.