Black box multigrid for systems
Applied Mathematics and Computation - Second Copper Mountain conference on Multigrid methods Copper Mountain, Colorado
Two multigrid methods for three-dimensional problems with discontinuous and anisotropic coefficients
SIAM Journal on Scientific and Statistical Computing
Improving register allocation for subscripted variables
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Fortran at ten gigaflops: the connection machine convolution compiler
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
A semicoarsening multigrid algorithm for SIMD machines
SIAM Journal on Scientific and Statistical Computing
New tiling techniques to improve cache temporal locality
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
SIAM Journal on Scientific Computing
Memory characteristics of iterative methods
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Semicoarsening Multigrid on Distributed Memory Machines
SIAM Journal on Scientific Computing
MPI: The Complete Reference
Increasing temporal locality with skewing and recursive blocking
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Achieving Scalable Locality with Time Skewing
International Journal of Parallel Programming
Improving the computational intensity of unstructured mesh applications
Proceedings of the 19th annual international conference on Supercomputing
Compressible memory data structures for event-based trace analysis
Future Generation Computer Systems
Compressible memory data structures for event-based trace analysis
Future Generation Computer Systems
Effective source-to-source outlining to support whole program empirical optimization
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Introducing the open trace format (OTF)
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
Hi-index | 0.00 |
LLNL's hypre library is an object-oriented library for the solution of sparse linear systems on parallel computers. While hypre facilitates rapid-prototyping of complex parallel applications, our experience is that without careful attention to temporal data locality, node performance of applications developed using hypre will fall significantly short of peak performance on architectures based on modern microprocessors. In this paper, we describe our experiences analyzing and tuning the performance of smg98, a benchmark that exercises hypre's semicoarsening multigrid solver. In the original code, the lack of temporal data reuse in the registers and caches significantly hurts performance. We describe a variety of techniques we applied to hand-tune the performance of hypre's semicoarsening multigrid solver. We expect that similar strategies will be applicable to other solvers and codes based on hypre as well. We present performance measurements of smg98 on both SGI Origin and Compaq Alpha platforms. Overall, our optimizations improve the node performance of smg98 by nearly a factor of two on large problems.