LU-GPU: Efficient Algorithms for Solving Dense Linear Systems on Graphics Hardware
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Using many-core hardware to correlate radio astronomy signals
Proceedings of the 23rd international conference on Supercomputing
Adaptive Parallel Householder Bidiagonalization
Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Hi-index | 0.00 |
In this paper, we discuss the GPU-based implementation and optimization of Householder bidiagonalization, a matrix factorization method which is an integral part of full Singular Value Decomposition (SVD) - an important algorithm for many problems in the research domain of Multimedia Content Analysis (MMCA). On cluster computers, complex adaptive run-time techniques often must be implemented to overcome the growing negative performance impact of load imbalances and to ensure reasonable speedup. We show that the nature of the many-core platform can avoid the necessity of applying such complex run-time parallelization techniques in software while achieving a performance of 64 gigaflops/s on a single-GPU GTX 295 in double precision, 82% of the theoretical peak performance.