The algebraic eigenvalue problem
The algebraic eigenvalue problem
ACM Transactions on Mathematical Software (TOMS)
Applied numerical linear algebra
Applied numerical linear algebra
ScaLAPACK user's guide
Iterative Refinement in Floating Point
Journal of the ACM (JACM)
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
ACM Transactions on Mathematical Software (TOMS)
Accuracy and Stability of Numerical Algorithms
Accuracy and Stability of Numerical Algorithms
Exploiting fast hardware floating point in high precision computation
ISSAC '03 Proceedings of the 2003 international symposium on Symbolic and algebraic computation
Introduction to the cell multiprocessor
IBM Journal of Research and Development - POWER5 and packaging
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Optimizing sparse matrix-vector multiplication using index and value compression
Proceedings of the 5th conference on Computing frontiers
ACM Transactions on Mathematical Software (TOMS)
International Journal of Parallel, Emergent and Distributed Systems
Using GPUs to improve multigrid solver performance on a cluster
International Journal of Computational Science and Engineering
Impact of Quad-Core Cray XT4 System and Software Stack on Scientific Computation
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Towards dense linear algebra for hybrid GPU accelerated manycore systems
Parallel Computing
Exploiting compression opportunities to improve SpMxV performance on shared memory systems
ACM Transactions on Architecture and Code Optimization (TACO)
The impact of data distribution in accuracy and performance of parallel linear algebra subroutines
VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Exploiting dense substructures for fast sparse matrix vector multiplication
International Journal of High Performance Computing Applications
A convolve-and-merge approach for exact computations on high-performance reconfigurable computers
International Journal of Reconfigurable Computing - Special issue on High-Performance Reconfigurable Computing
Hi-index | 0.00 |
Recent versions of microprocessors exhibit performance characteristics for 32 bit floating point arithmetic (single precision) that is substantially higher than 64 bit floating point arithmetic (double precision). Examples include the Intel's Pentium IV and M processors, AMD's Opteron architectures and the IBM's Cell Broad Engine processor. When working in single precision, floating point operations can be performed up to two times faster on the Pentium and up to ten times faster on the Cell over double precision. The performance enhancements in these architectures are derived by accessing extensions to the basic architecture, such as SSE2 in the case of the Pentium and the vector functions on the IBM Cell. The motivation for this paper is to exploit single precision operations whenever possible and resort to double precision at critical stages while attempting to provide the full double precision results. The results described here are fairly general and can be applied to various problems in linear algebra such as solving large sparse systems, using direct or iterative methods and some eigenvalue problems. There are limitations to the success of this process, such as when the conditioning of the problem exceeds the reciprocal of the accuracy of the single precision computations. In that case the double precision algorithm should be used.