The Matrix Template Library: Generic Components for High-Performance Scientific Computing
Computing in Science and Engineering
ISCOPE '98 Proceedings of the Second International Symposium on Computing in Object-Oriented Parallel Environments
Minimizing development and maintenance costs in supporting persistently optimized BLAS
Software—Practice & Experience - Research Articles
A Portable Programming Interface for Performance Evaluation on Modern Processors
International Journal of High Performance Computing Applications
Anatomy of high-performance matrix multiplication
ACM Transactions on Mathematical Software (TOMS)
Hi-index | 0.00 |
This paper presents the Benchmark Template Library in C++, in short BTL++, which is a flexible framework to assess the run time of user defined computational kernels. When the same kernel is implemented in several different ways, the collected performance data can be used to automatically construct an interface library that dispatches a function call to the fastest variant available.The benchmark examples in this article are mostly functions from the dense linear algebra BLAS API. However, BTL++can be applied to any kernel that can be called by a function from a C++ main program. Within the same framework, we are able to compare different implementations of the operations to be benchmarked, from libraries such as ATLAS, over procedural solutions in Fortran and C to more recent C++ libraries with a higher level of abstraction. Results of single threaded and multi-threaded computations are included.