Improving High-Performance Sparse Libraries Using Compiler-Assisted Specialization: A PETSc Case Study

Authors:
Shreyas Ramalingam;Mary Hall;Chun Chen
Affiliations:
-;-;-
Venue:
IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Year:
2012

Citing 0
Cited 3

Polyhedra scanning revisited

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Towards making autotuning mainstream

International Journal of High Performance Computing Applications
A fast and resource-conscious MPI message queue mechanism for large-scale jobs

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Scientific libraries are written in a general way in anticipation of a variety of use cases that reduce optimization opportunities. Significant performance gains can be achieved by specializing library code to its execution context: the application in which it is invoked, the input data set used, the architectural platform and its backend compiler. Such specialization is not typically done because it is time consuming, leads to nonportable code and requires performance-tuning expertise that application scientists may not have. Tool support for library specialization in the above context could potentially reduce the extensive understanding required while significantly improving performance, code reuse and portability. In this work, we study the performance gains achieved by specializing the single processor sparse linear algebra functions in PETSc (Portable, Extensible Toolkit for Scientific Computation) in the context of three scalable scientific applications on the Hopper Cray XE6 Supercomputer at NERSC. We use CHiLL (Compos able High-Level Loop Transformation Framework) to apply source level transformations tailored to the special needs of sparse computations and automatically generate highly optimized PETSc functions. We demonstrate significant performance improvements of more than 1.8X on the library functions and overall gains of 9 to 24% on three scalable applications that use PETSc's sparse matrix capabilities.