On improving performance and energy profiles of sparse scientific applications

Authors:
Konrad Malkowski;Ingyu Lee;Padma Raghavan;Mary Jane Irwin
Affiliations:
Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA;Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA;Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA;Department of Computer Science and Engineering, The Pennsylvania State University, University Park, PA
Venue:
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Year:
2006

Citing 11
Cited 0

An extended set of FORTRAN basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
The SimpleScalar tool set, version 2.0

ACM SIGARCH Computer Architecture News
Improving the memory-system performance of sparse-matrix vector multiplication

IBM Journal of Research and Development
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Performance modeling and tuning of an unstructured mesh CFD application

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
An overview of the BlueGene/L Supercomputer

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Performance optimizations and bounds for sparse matrix-vector multiply

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Fine-Grained Dynamic Voltage and Frequency Scaling for Precise Energy and Performance Trade-Off Based on the Ratio of Off-Chip Access to On-Chip Computation Times

Proceedings of the conference on Design, automation and test in Europe - Volume 1
A Performance and Scalability Analysis of the BlueGene/L Architecture

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Conjugate gradient sparse solvers: performance-power characteristics

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In many scientific applications, the majority of the execution time is spent within a few basic sparse kernels such as sparse matrix vector multiplication (SMV). Such sparse kernels can utilize only a fraction of the available processing speed because of their relatively large number of data accesses per floating point operation, and limited data locality and data re-use. Algorithmic changes and tuning of codes through blocking and loop unrolling schemes can improve performance but such tuned versions are typically not available in benchmark suites such as the SPEC CFP 2000. In this paper, we consider sparse SMV kernels with different levels of tuning that are representative of this application space. We emulate certain memory subsystem optimizations using SimpleScalar and Wattch to evaluate improvements in performance and energy metrics. We also characterize how such an evaluation can be affected by the interplay between code tuning and memory subsystem optimizations. Our results indicate that the optimizations reduce execution time by over 40%, and the energy by over 85%, when used with power control modes of CPUs and caches. Furthermore, the relative impact of the same set of memory subsystem optimizations can vary significantly depending on the level of code tuning. Consequently, it may be appropriate to augment traditional benchmarks by tuned kernels typical of high performance sparse scientific codes to enable comprehensive evaluations of future systems.