LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
Tiling, Block Data Layout, and Memory Hierarchy Performance
IEEE Transactions on Parallel and Distributed Systems
High-performance linear algebra algorithms using new generalized data structures for matrices
IBM Journal of Research and Development
The potential of the cell processor for scientific computing
Proceedings of the 3rd conference on Computing frontiers
CellSort: high performance sorting on the cell processor
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
FFTC: fastest Fourier transform for the IBM cell broadband engine
HiPC'07 Proceedings of the 14th international conference on High performance computing
Hi-index | 0.00 |
The unique architecture of the heterogeneous multicore Cell processor offers great potential for high performance computing.It offers features such as high memory bandwidth using DMA, usermanaged local stores and SIMD architecture. In this paper, we presentstrategies for leveraging these features to develop a high performanceBLAS library. We propose techniques to partition and distribute dataacross SPEs for handling DMA efficiently. We show that suitable preprocessingof data leads to significant performance improvements whenthe data is unaligned. In addition, we use a combination of two kernels -a specialized high performance kernel for the more frequently occurringcases and a generic kernel for handling boundary cases - to obtain betterperformance. Using these techniques for double precision, we obtain upto 70-80% of peak performance for different memory bandwidth boundlevel 1 and 2 routines and up to 80-90% for computation bound level 3routines.