Scientific and Engineering C++: An Introduction with Advanced Techniques and Examples
Scientific and Engineering C++: An Introduction with Advanced Techniques and Examples
An updated set of basic linear algebra subprograms (BLAS)
ACM Transactions on Mathematical Software (TOMS)
C++ Templates
Minimizing development and maintenance costs in supporting persistently optimized BLAS
Software—Practice & Experience - Research Articles
deal.II—A general-purpose object-oriented finite element library
ACM Transactions on Mathematical Software (TOMS)
Intel threading building blocks
Intel threading building blocks
Hi-index | 0.00 |
This paper describes a short and simple way of improving the performance of vector operations (e.g. X = aY +bZ +..) applied to large vectors. In a previous paper [1] we described how to take advantage of high performance vector copy operation provided by the ATLAS library [2] in the context of C++ Expression Template (ET) mechanism. Here we present a multi-threaded implementation of this approach. The proposed ET implementation that involves a parallel blocking technique, leads to significant performance increase compared to existing implementations (up to x2.7) on dual socket x86_64 targets.