the LINPACK benchmark: an explanation
Proceedings of the 1st International Conference on Supercomputing
Automated program recognition by graph parsing
Automated program recognition by graph parsing
Transitive closure of infinite graphs and its applications
International Journal of Parallel Programming - Special issue: selected papers from the eighth international workshop on languages and compilers for parallel computing
A specification driven slicing process for identifying reusable functions
Journal of Software Maintenance: Research and Practice
Automatic algorithm recognition and replacement: a new approach to program optimization
Automatic algorithm recognition and replacement: a new approach to program optimization
Generation of Efficient Nested Loops from Polyhedra
International Journal of Parallel Programming - Special issue on instruction-level parallelism and parallelizing compilation, part 2
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
Automatically tuned linear algebra software
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
A Framework for Source Code Search Using Program Patterns
IEEE Transactions on Software Engineering
Algorithm Recognition based on Demand-Driven Data-flow Analysis
WCRE '03 Proceedings of the 10th Working Conference on Reverse Engineering
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A framework for adaptive algorithm selection in STAPL
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms
International Journal of High Performance Computing Applications
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
As both programs and machines are becoming more complex, writing high performance codes is an increasingly difficult task. In order to bridge the gap between the compiled-code and peak performance, resorting to domain or architecture-specific libraries has become compulsory. However, deciding when and where to use a library function must be specified by the programmer. This partition between library and user code is not questioned by the compiler although it has a great impact on performance. We propose in this paper a new method that helps the user find in its application all code fragments that can be replaced by library calls. The same technique can be used to change or fusion multiple calls into more efficient ones. The results of the alternative detection of BLAS 1 and 2 in SPEC are presented.