Deciding where to call performance libraries

Authors:
Christophe Alias;Denis Barthou
Affiliations:
Laboratoire PRiSM, Université de Versailles, France;Laboratoire PRiSM, Université de Versailles, France
Venue:
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Year:
2005

Citing 14
Cited 1

the LINPACK benchmark: an explanation

Proceedings of the 1st International Conference on Supercomputing
Automated program recognition by graph parsing

Automated program recognition by graph parsing
Transitive closure of infinite graphs and its applications

International Journal of Parallel Programming - Special issue: selected papers from the eighth international workshop on languages and compilers for parallel computing
A specification driven slicing process for identifying reusable functions

Journal of Software Maintenance: Research and Practice
Automatic algorithm recognition and replacement: a new approach to program optimization

Automatic algorithm recognition and replacement: a new approach to program optimization
Generation of Efficient Nested Loops from Polyhedra

International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Basic Linear Algebra Subprograms for Fortran Usage

ACM Transactions on Mathematical Software (TOMS)
Automatically tuned linear algebra software

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
A Framework for Source Code Search Using Program Patterns

IEEE Transactions on Software Engineering
Algorithm Recognition based on Demand-Driven Data-flow Analysis

WCRE '03 Proceedings of the 10th Working Conference on Reverse Engineering
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A framework for adaptive algorithm selection in STAPL

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms

International Journal of High Performance Computing Applications

Empirical performance model-driven data layout optimization and library call selection for tensor contraction expressions

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

As both programs and machines are becoming more complex, writing high performance codes is an increasingly difficult task. In order to bridge the gap between the compiled-code and peak performance, resorting to domain or architecture-specific libraries has become compulsory. However, deciding when and where to use a library function must be specified by the programmer. This partition between library and user code is not questioned by the compiler although it has a great impact on performance. We propose in this paper a new method that helps the user find in its application all code fragments that can be replaced by library calls. The same technique can be used to change or fusion multiple calls into more efficient ones. The results of the alternative detection of BLAS 1 and 2 in SPEC are presented.