Solution of large, dense symmetric generalized eigenvalue problems using secondary storage
ACM Transactions on Mathematical Software (TOMS)
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
PVM: Parallel virtual machine: a users' guide and tutorial for networked parallel computing
ScaLAPACK user's guide
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
Banded Eigenvalue Solvers on Vector Machines
ACM Transactions on Mathematical Software (TOMS)
Algorithm 807: The SBR Toolbox—software for successive band reduction
ACM Transactions on Mathematical Software (TOMS)
Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Scheduling of QR Factorization Algorithms on SMP and Multi-Core Architectures
PDP '08 Proceedings of the 16th Euromicro Conference on Parallel, Distributed and Network-Based Processing (PDP 2008)
Updating an LU Factorization with Pivoting
ACM Transactions on Mathematical Software (TOMS)
Parallel tiled QR factorization for multicore architectures
Concurrency and Computation: Practice & Experience
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines
Scientific Programming
Comparative study of one-sided factorizations with multiple software packages on multi-core hardware
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
The libflame Library for Dense Matrix Computations
IEEE Design & Test
Parallel Two-Sided Matrix Reduction to Band Bidiagonal Form on Multicore Architectures
IEEE Transactions on Parallel and Distributed Systems
The impact of multicore on math software
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
The International Exascale Software Project roadmap
International Journal of High Performance Computing Applications
IPDPSW '11 Proceedings of the 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and PhD Forum
IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Communication avoiding successive band reduction
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Efficient generalized Hessenberg form and applications
ACM Transactions on Mathematical Software (TOMS)
Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication
Proceedings of the 27th international ACM conference on International conference on supercomputing
An improved parallel singular value algorithm and its implementation for multicore hardware
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
This paper introduces a novel implementation in reducing a symmetric dense matrix to tridiagonal form, which is the preprocessing step toward solving symmetric eigenvalue problems. Based on tile algorithms, the reduction follows a two-stage approach, where the tile matrix is first reduced to symmetric band form prior to the final condensed structure. The challenging trade-off between algorithmic performance and task granularity has been tackled through a grouping technique, which consists of aggregating fine-grained and memory-aware computational tasks during both stages, while sustaining the application's overall high performance. A dynamic runtime environment system then schedules the different tasks in an out-of-order fashion. The performance for the tridiagonal reduction reported in this paper is unprecedented. Our implementation results in up to 50-fold and 12-fold improvement (130 Gflop/s) compared to the equivalent routines from LAPACK V3.2 and Intel MKL V10.3, respectively, on an eight socket hexa-core AMD Opteron multicore shared-memory system with a matrix size of 24000 x 24000.